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A  STUDY  OF  METHODS  FOR  SELECTION  OF 
QUOTIENT  DIGITS  DURING  DIGITAL  DIVISION 
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Department  of  Computer  Science 

University  of  Illinois,  1970 


This  study  concerns  a  class  of  non-restoring  division  schemes  in 
which  redundancy  is  introduced  into  the  representation  of  the  quotient  thereby 
permitting  quotient  digits  to  be  selected  from  highly  truncated  versions  of 
the  divisior  and  partial  remainders.   The  mechanism  for  selection  of  quotient 
digits  is  a  limited  precision  model  of  the  full  precision  division  which  it 
controls  by  the  generation  of  simple  microprogram  instructions.   A  major 
advantage  of  this  approach  to  division  is  a  high  degree  of  congruity  with 
commonly  used  multiplication  structures,  including  those  making  use  of  limited 
propagation  adder-subtracters,  for  example,  carry-save  adders. 

A  cost  versus  performance  analysis  for  a  large  class  of  quotient 
selection  mechanisms  (model  divisions)  is  developed.   The  class  is  defined  in 
terms  of  a  block  diagram  and  a  set  of  ten  design  parameters.   By  varying  the 
structure  of  the  sub-blocks  and  the  values  of  the  parameters ,  the  model 
division  scheme  ranges  from  that  of  forming  quotient  digits  by  multiplying  the 
dividend  by  the  inverse  of  the  divisor,  to  that  of  a  direct  table  look-up  of 
the  quotient  digit.   So  called  hybrid  structures  exist  between  these  two  cases. 
Algorithms  are  described  which  synthesize  near  minimal  cost  realizations  of  the 
most  complicated  sub-blocks:  a  combinatorial  logic  network  to  produce  appro- 
priate estimates  of  the  reciprocal  of  the  divisor,  and  a  combinatorial  logic 
network  to  generate  a  quotient  digit  directly  as  a  function  of  the  bits  in 
estimate  of  the  divisor  and  partial  remainder.   Formulas  are  given  for  the 
cost  of  the  remaining  sub-blocks.   For  a  given  type  structure  the  primary 


determinant  of  performance  is  the  radix  of  the  model  division,  r  =  2   ,  where 
k  is  the  number  of  bits  of  quotient  produced  per  access  to  the  model  division. 
A  FORTRAN  implementation  of  the  synthesis  routines  is  used  to  obtain 
the  near  minimal  cost  for  several  different  structures  and  sets  of  design 
parameter  values.   The  numerical  results,  together  with  the  insight  gained  in 
obtaining  them,  are  applied  to  hypothesize  a  formula  for  minimal  cost.   The 
analysis  includes  a  multi-variable  expression  which  relates  cost  to  the  radix 
of  the  model  division,  r,  the  degree  of  redundancy  in  the  quotient  representa- 
tion, and  the  magnitude  and  direction  of  the  maximum  truncation  error  in  the 
divisor  and  partial  remainder  estimates.   The  cost  formulas,  together  with 
easily  derived  performance  formulas,  are  used  to  tabulate  expected  cost  and 
performance  for  a  variety  of  structures.   It  is  found  that  for  most  schemes 
the  cost  varies  exponentially  with  performance  and  consequently,  that  many  of 
the  higher  radix  schemes  are  not  practicable.   A  radix  U,  direct  table  look-up, 
however,  can  be  built  with  about  ten,  10-input  gates,  and  assuming  10  ns. 
logic,  could  produce  60  bits  of  quotient  in  about  k   us.   The  study  is  concluded 
with  suggestions  for  further  investigation. 


1 .   INTRODUCTION 

1. 1  Background 

Since  division  is  the  mathematical  inverse  of  multiplication,  one 
might  hope  that  the  cost  of  implementing  both  a  multiplication  and  division 
operation  would  not  be  much  different  than  the  cost  of  implementing  multipli- 
cation alone.   Furthermore,  for  a  given  operand  length,  one  might  expect  the 
executions  times  for  the  operations  to  be  about  the  same.   In  actual  practice 
this  hope  has  not  been  realized,  largely  due  to  the  fact  that  division,  un- 
like multiplication,  is  inherently  a  trial-and-error  process. 

In  multiplication,  a  product  is  accumulated  by  the  successive 
addition  of  multiples  of  the  multiplicand  to  a  partial  product.   The  selection 
of  which  multiple  to  add  is  dependent  upon  a  digit,  radix  r,  of  the  multi- 
plier — a  quantity  which  is  known  apriori. 

Now  consider  a  recursive  relationship  for  a  class  of  division 
techniques  based  upon  subtraction.   This  relationship  is  defined  by 

p    =  rp  -  q.,+1<is   J  =  0,1,..., m-1  (1.1) 


in  which 


pn  is  the  dividend, 

p.  is  the  partial  remainder  used  in  the  j   recursion, 
J 

p  is  the  remainder, 
m 

j  is  the  recursion  index, 

q.  is  the  j    quotient  digit, 
J 

d  is  the  divisor,  and 
r  is  the  radix. 


In  forming  the  partial  remainder,  p    ,  a  multiple  of  the  divisor 

J  -*- 

is  subtracted  from  the  previous  partial  remainder  shifted  left  by  one  digital 
position.   The  selection  of  which  multiple  to  subtract  is  dependent  upon  a 
digit  of  the  quotient;  but  it  is  precisely  this  quotient  digit  that  we  must 
compute.   It  is  not  known  apriori.   As  it  stands  this  relationship  for  divi- 
sion does  not  adequately  specify  how  q..,  is  selected  until  we  add  a  restric- 
tion  such  as  |p.   |  <  |d| .   The  important  point  here  is  that  division  not 
only  requires  an  addition  or  subtraction  as  in  multiplication,  but  also  the 
selection  of  a  quotient  digit  such  that  the  value  of  the  contents  of  the  ac- 
cumulator after  the  subtraction  is  within  a  specified  range.   If  it  is  not 
within  this  range  then  some  correction  is  required. 

Several  effective  techniques  have  been  developed  for  accelerating 
the  execution  of  multiplication.   Foremost  among  them  are  the  following: 

1.  Use  of  adders  or  subtracters  which  postpone  carry  or 
borrow  propagation  until  a  terminal  step. 

2.  The  use  of  a  higher  radix  (greater  than  2)  so  that 
several  bits  of  the  multiplier  are  retired  in  one 
iteration. 

3.  The  introduction  of  redundancy*  into  the  multiplier 
by  multiplier  recoding. 

The  success  of  such  techniques  in  multiplication  raises  the  question 
of  their  applicability  to  division.   A  significant  contribution  to  the  answer 
was  made  with  the  discovery  of  SRT  division. 


*Redundancy  or  redundant  r epr e s ent at i on  refers  to  a  number  representation  in 
which  each  radix   r  digit  may  assume  more  than  r  different  values. 


In  the  middle  1950 fs  D.  Sweeney  of  IBM,  J.  E.  Robertson  of  the 
University  of  Illinois  [  l]*,  and  T.  D.  Tocher  [  2 ]  of  Imperial  College, 
London,  independently  discovered  a  binary  division  technique  especially  suited 
for  implementation  in  an  electronic  digital  computer.   SRT  division  was  named 
by  C.  V.  Freiman  of  IBM  in  a  paper  discussing  its  statistical  properties   [  3  ] 
although  an  example  of  the  technique  may  actually  have  been  presented  by 
Nadler  [  k]   in  a  1956  paper  describing  a  computer  designed  and  built  by  the 
Institute  of  Mathematical  Machines  of  the  Czechoslovak  Academy  of  Science 
under  the  direction  of  Dr.  Antonin  Svoboda.   Whether  or  not  the  Nadler  work 
is  equivalent  to  SRT  is  obscured  by  the  fact  that  it  is  discussed  in  conjunc- 
tion with  a  stored-carry  adder  and  accumulator. 

The  basis  of  SRT  division  is  the  discovery  that  introducing  redun- 
dancy into  the  representation  of  the  quotient  yields  more  freedom  in  the 
selection  of  a  quotient  digit  at  each  step  of  the  recursion.   In  SRT  division 
this  freedom  is  used  to  increase  the  probability  of  a  zero  quotient  digit,  for 
which  the  next  partial  remainder  is  produced  merely  by  a  shift  rather  than  by 
a  subtraction  followed  by  a  shift.   This  flexibility  is  in  contrast  to  con- 
ventional restoring  or  non-restoring  division  which  require  a  full-precision 
subtraction  for  each  quotient  bit  generated.   Even  though  we  are  considering 
a  binary  number  system,  digit  values  for  SRT  division  are  1,0,1  (the  over- 
bar  denotes  negation,  i.e.  -l),  and  thus  we  have  redundancy. 

In  1965,  Robertson  [  5  ]  extended  the  concepts  inherent  in  SRT 
division  to  higher  radix  division  schemes.   The  fundamental  tenets  of  the 
method  remain,  namely,  that  by  introducing  redundancy  into  the  representation 


^Numbers  in  brackets  refer  to  entries  under  References. 


of  the  quotient,  the  selection  of  a  quotient  digit  at  each  step  of  the  recur- 
sion need  not  be  precise.   For  the  higher  radix  cases,  a  larger  set  of  quo- 
tient digits  is  necessary  and  thus  the  probability  of  a  zero  quotient  digit 
is  reduced  to  the  extent  that  adder  bypass  no  longer  yields  significant  speed 
improvement.   However,  the  redundancy  may  still  be  put  to  advantage;  it 
permits  the  selection  of  a  quotient  digit  based  only  upon  high-order  digits 
of  the  divisor  and  high-order  digits  of  the  shifted  partial  remainder. 

In  reference  [  5  ]  ,  Robertson  introduces  the  notion  of  a  quotient 
selection  mechanism  with  inputs  consisting  of  estimates  of  the  divisor  and 
shifted  partial  remainder.   He  notes  that  the  mechanism  for  selection  of  quo- 
tient digits  may  be  thought  of  as  a  limited  precision  model  of  the  full 
precision  division.   The  procedures  in  the  model  need  not  be  the  same  as  the 
procedure  of  the  full  precision  scheme  which  it  controls.   The  model  division 
generates  simple  microprogram  instructions  to  the  full-precision  unit.   His 
paper  also  presents  an  indirect,  relative  measure  of  the  cost  of  selection  of 
quotient  digits. 

The  authors  Master's  Thesis  in  1967  is  based  largely  upon  Robertson's 
work  as  described  in  references  [  1]  and  [5].   The  complete  thesis,  includ- 
ing an  example  of  a  actual  implementation  of  a  model  division  scheme,  is 
available  in  report  form  [6];  the  more  theoretical  aspects  of  the  work  are 
available  in  journal  form  [7].   Implementation  is  also  discussed  in  a  more 
recent  report  in  conjunction  with  the  development  of  the  arithmetic  units  of 
the  Illiac  III  Computer  [8],  [9]. 

The  authors  paper  [  7]  is  largely  tutorial.  It  presents  a  detailed 
review  of  Robertson's  proof  of  the  validity  of  the  class  of  division  techni- 
ques to  which  the  model  division  approach  is  applicable.   The  proof  will  not 


be  repeated  in  the  present  work.   The  paper  also  describes  a  graphical  repre- 
sentation, the  so-called  P-D  plot,  suggested  by  C.  V.  Freiman  [5],  which  is 
useful  in  describing  the  division  procedure,  and  then  develops  expressions 
for  the  maximum  number  of  bits  of  the  divisor  and  partial  remainder  which  must 
be  inspected  in  order  to  determine  a  correct  quotient  digit  for  a  given  radix, 
a  given  lower  limit  on  the  divisor,  and  a  given  amount  of  redundancy*  in  the 
representation  of  the  quotient.   These  expressions,  which  provide  a  worst- 
case  measure  of  costs,  also  account  for  redundancy  in  the  representation  of 
the  partial  remainder  such  as  produced  by  a  member  of  the  family  of  carry- 
save  adders  or  borrow-save  subtracters  [10],  [ll] »  [12]. 

We  are  now  in  a  position  to  consider  the  design  of  division  schemes 
which  are  highly  compatible  with  multiplication  structures.   The  model  divi- 
sion determines  which  multiples  of  the  divisor  are  to  be  combined  with  the 
partial  remainder.   In  this  respect  it  is  analogous  to  the  multiplier  recoder 
and  may  be  thought  of  as  a  quotient  recoder.   Multiplier  recoding  logic  is 
usually  entirely  combinatorial  and  grows  in  complexity  only  linearly  with  the 
radix.   The  model  division  is  complicated  by  the  fact  that  the  quotient  digit 
is  a  function  of  both  the  divisor  and  the  partial  remainder  and  the  fact  that 
the  partial  remainder,  unlike  the  divisor  or  multiplier,  is  not  constant 
throughout  a  given  operation.   An  analysis  of  the  growth  of  the  complexity  of 
a  model  division  with  increase  of  radix  is  one  aspect  of  this  thesis  work. 

But  despite  these  complications,  the  strong  analogy  between  multi- 
plier recoding  and  the  concept  of  the  model  division  leads  to  a  division 


*A  measure  of  redundancy  will  be  defined  later  in  this  paper. 

Robertson  has  made  a  formal  correspondence  between  multiplier  recodings  and 
quotient  recodings  produced  by  SRT  division.   See  Ref.  [13]. 


structure  which  is  potentially  highly  compatible  with  a  given  multiplication 
scheme.   The  difference  in  the  execution  time  between  the  iterative  portion 
of  multiplication  and  division  is  essentially  the  difference  between  the  total 
time  required  to  recode  the  multiplier  and  that  to  recode  the  quotient.   The 
bulk  of  the  logic  accounting  for  the  difference  in  cost  of  a  multiplier  and 
the  cost  of  a  multiplier  and  divider  may  then  be  associated  with  the  cost  of 
implementing  the  model  division. 

1.2  Present  Work 

With  this  background  in  mind,  we  now  turn  to  an  introduction  to  the 
present  work.   Section  2  begins  by  defining  a  class  of  full-precision  multi- 
plication-division structures.   We  then  define  a  rather  general  block 
structure  of  a  quotient  selection  mechanism  suitable  for  use  as  a  model 
division.   The  parameters  of  the  model  include  the  radix,  the  magnitude  of 
the  largest  quotient  digit,  the  range  of  the  divisor,  and  the  truncation 
error  in  the  estimates  of  the  divisor  and  partial  remainder. 

The  flexibility  of  the  model  division  approach  and  the  generality 
of  the  model  proposed  in  Section  2  offer  a  large  number  of  design  possibili- 
ties.  A  major  goal  of  this  work  is  to  investigate  the  cost  versus  perfor- 
mance of  various  designs  and  attempt  to  extract  analytic  results.   Such  an 
attempt  requires  the  definition  of  a  measure  of  cost  and  performance.   A 
useful  cost  measurement  should,  in  some  sense,  be  minimal,  and  therefore  we 
must  consider  minimization  criterion  and  a  minimization  algorithm.   These 
topics  are  discussed  in  Section  3. 

The  first  approach  taken  to  determining  cost  and  performance  of 
various  quotient  selectors  is  that  of  computer-aided  generation  of  a  specific 


design  followed  by  analysis.   In  Section  h   algorithms  are  described  which, 
when  supplied  with  parameter  values,  will  generate  logic  definitions  of  the 
sub-blocks  of  the  model.   Most  of  the  logic  will  be  defined  in  a  minimal  sum- 
of-products  form  which  could  serve  as  input  to  a  logic  design  program  custom- 
ized for  a  given  class  of  logic. 

To  this  point  we  will  have  developed  a  mechanism  for  generating 
and  comparing  various  designs  for  a  model  division.   The  approach  has  been 
one  of  computer-aided  design  followed  by  computer-aided  minimization.   The 
results  from  the  computer  work  are  tabulated  in  Section  5.   Although  the 
design  and  minimization  programs  are  quite  efficient,  the  large  number  of 
design  possibilities  together  with  the  large  number  of  terms  in  the  logic 
equations  for  the  higher  radix  models  strongly  discourages  an  exhaustive 
analysis.   An  additional  result  described  in  Section  5  has  been  insight  which 
led  to  development  of  analytic  expressions  for  the  cost  of  a  structure. 

Section  6  is  a  tabulation  of  estimates  of  cost  and  performance 
based  upon  the  equations  and  computer  generated  results  described  in  Section 
5.   The  final  selection  is  a  summary  and  some  conclusions  as  to  the  relative 
merits  of  various  members  of  the  family  of  model  division  schemes  considered. 
The  section  includes  a  list  of  suggestions  for  further  investigation. 


2.   DEFINITION  OF  THE  DIVISION  PROCEDURE 

2.1  Formal  Definition  of  the  Full  Precision  Division 

The  members  of  the  class  of  division  algorithms  which  may  be  em- 
ployed to  perform  the  full-precision  division  are  those  defined  by  the 
recursive  relationship  (l.l)  and  the  list  of  restrictions  given  below.   The 
recursive  relationship  is  repeated  here  for  convenience. 


Pi+1  =  rPi  -  <11+-,  d»   J  =  0,l,...,m-l  (1.1) 


in  which 

p   is  the  dividend, 
o 

p.  is  the  partial  remainder  used  in  the  jth  recursion, 

J 

p  is  the  remainder, 
m 

j  is  the  recursion  index, 

q.  is  the  jth  quotient  digit  (radix  -  r), 

J 

d  is  the  divisor,  and 
r  is  the  radix. 

The  quantity  rp .  is  referred  to  as  the  shifted  partial  remainder. 
J 

Restrictions  which  apply  are  as  follows: 

1.   Allowable  quotient  digits  are 

-n,  -n+1,  -n+2,  ...,0,1,2,...,  n  where 

n  is  an  integer  such  that  n-(r-l)/2.  (2.1) 


2.   The  dividend,  p  ,  must  be  in  the  range  defined  by 


IpoI  <  P  |d|  (2.2) 

where  p  =  n/(r-l).  (2.3) 

3.   The  divisor  must  be  within  a  given  range,  i.e.  the 
quantities  a  and  b  must  be  defined  such  that 


a  -  |d|   -  b.  (2.U) 


h.      Every  quotient  digit,  q.+1  for  j  from  0  through  m-1, 

must  be  chosen  such  that  p...,  as  defined  by  (l.l)  is 

j+1  J 

within  the  range 


|PJ+1I  "  P  ldl-  (2-5) 

The  derivation  of  these  restrictions  is  given  in 
[  6 ]  and  [  T ] .   Note  that  the  forms  of  rp  and  d  have 
not  been  limited  to  non-redundant  representations. 
They  may  be  in  forms  such  as  produced  by  carry-save 
adders  or  borrow-save  subtracters. 

2.2  Graphical  Representation  of  the  Division  Procedure 

This  division  procedure  may  be  defined  graphically  with  a  con- 
struction suggested  by  C .  V.  Freiman  [  5  ].   The  basis  for  the  construction  is 
the  recursive  relationship  (l.l)  together  with  the  range  restriction  (2.5). 
The  figure  is  essentially  a  plot  of  partial  remainder  versus  divisor  values 
and  is  thus  designated  a  P-D  plot. 
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Solving  the  recursive  relationship  for  rp .  yields 

J 

rP,  =  P-+-I  +  cl-+-i  d«   For  a  fixed  quotient  digit,  the  upper  limit  of  rp .  as  a 

function  of  the  divisor  ,d, occurs  when  p.,_  is  maximum,  i.e.,  when  p.  „  =  pd 

J+1  0+1 

and  thus 

rp.     =  (p  +  q.  ...  )d.  (2.6) 

j  max         j+1 

Likewise  the  lower  limit  is  defined  by 

rp.   .   =  (-p  +  ck- )d.  (2.7) 

J  mm  J+1 

These  linear  functions  of  d  may  be  plotted  as  a  family  of  curves  with  q    as 

a  parameter  ranging  from  -n  through  +n  in  steps  of  1.   The  area  between 

rp.     and  rp .   .   for  a  given  q.,,  =  i  will  be  denoted  the  "q(i)  region." 
j  max      *j  mm       to      J+1 

For  given   r,  n,  a,  and  b,  the  division  procedure  is  specified  by 

the  corresponding  P-D  plot.   A  given  value  of  d  and  rp .  will  specify  a  point 

J 

in  a  q(i)  area.   The  quotient  digit  q.+1  is  therefore  i  and  is  used  in 

J  -L 

f orming  p . 

j+1 

Figure  1  is  an  example  of  a  P-D  plot  with  r  =  k,   n=2,  a  =  1/2 

and  b  =  1.   The  equations  for  the  lines  denoted  2',  2,  etc.  are  defined  in 

Table  1.   Note  that  as  a  consequence  of  the  redundancy  introduced  into  the 

representation  of  the  quotient  there  is  overlap  between  adjacent  quotient 

regions.   Some  pairs  (d,  rp . )  will  specify  a  point  for  which  either 

J 

q.,.,  =  i  or  q.  _.  =  i  -  1  is  a  valid  choice.   It  is  this  overlap  which  permits 
^-j+1        ^j+1 

quotient  digit  selection  to  be  made  on  the  basis  of  estimates  of  the  full 
precision  divisor  and  shifted  partial  remainder. 


2.3  Formal  Definition  of  the  Quotient  Selection  Procedure 

The  quotient  selection  mechanism  may  be  defined  as  a  device  that 
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Figure  1.  P-D  Plot  with  r=U,  n=2 


rp.  =  + 


n 


ti—  d  +  q  ,  ,  d 
-1     qj+l 


Designation 

Vl 

in  Figure 

1 

2' 

2 

2 

2 

1' 

1 

1 

1 

0' 

0 

0 

0 

I1 

1 

1 

1 

2' 

2 

2 

2 

*J+1 

2/3  d 
-2/3  d 

2/3  d 
-2/3  d 

2/3  d 
-2/3  d 

2/3  d 
-2/3  d 

2/3  d 
-2/3  d 


Equation 

rPJ   = 

8/3  d 

U/3  d 

5/3  d 

1/3  d 

2/3  d 

-2/3  d 

-1/3  d 

-1/3  d 

-U/3  d 

-8/3  d 

Table  1.   Equations  Defining  the  Regions  of  Figure  1, 
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when  given  estimates  of  the  divisor  and  shifted  partial  remainder  of  "suffi- 
cient" precision,  will  produce  a  quotient  digit,  i,  such  that  restriction 
(2.5)  is  satisfied.   The  definition  of  sufficient  precision  is  given  in 
the  following. 

With  a,  b,  n,  and  r  given,  the  P-D  plot  is  specified.   Let  D  "be  the 
set  of  all  divisor  values  for  a  given  operand  length  and  range  specified  by 
(a,  b).   Let  P  be  the  set  of  all  values  of  allowable  shifted  partial  remain- 
ders.  The  area  of  the  P-D  plot  is  the  Cartesian  product  of  P  and  D,  i.e.  the 
area  is  the  set 

P  x  D  =  {(rp,d)|rp  e  P  and  d  e   D}.  (2.8) 

Every  element  of  P  x  D  is  contained  in  one  or  more  q(i)  regions; 
thus  each  element  implies  a  set  I  =  {i|(rp,  d)  is  within  the  q(i)  region}. 
In  Figure  1,  every  pair  (d,  rp)  will  be  contained  in  either  one  or  two  q(i) 
regions.   This  will  be  the  case  for  all  examples  discussed  in  this  study, 
however,  for  p  =  n/(r  -  l)  greater  than  1,  a  given  (d,  rp)  would  be  contained 
within  two  or  more  q(i)  regions. 

The  inputs  to  the  quotient  selection  mechanisms  are  estimates  of  the 
divisor  and  shifted  partial  remainder.   Let  d  and  rp  denote  these  estimates, 
respectively,  and  let  Q(rp,  d)  be  the  output  of  the  quotient  selection 
mechanism  (a  quotient  digit)  for  given  estimates  (rp,  d). 

The  set  of  rp  and  d  values  are  of  sufficient  precision  and  the 
quotient  selection  procedure  is  correct  if  for  every  pair  (rp,  d)e  P  x  D, 
there  exists  an  ordered  pair  (rp,  d)  such  that  Q(rp,  d)  =  i,  where  i  belongs 
to  the  set  I  implied  by  (rp,  d). 
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In  actual  practice,  d  and  rp  are  formed  by  uniformly  truncating,  or 
truncating  and  rounding  d  and  rp,  respectively.   Assume  that  a  binary  repre- 
sentation of  d  is  truncated  between  position  6  and  6  +  1  to  the  right  of  the 
binary  point,  and  that  a  binary  representation  of  rp  is  truncated  between 
position  e  and  e  +  1  to  the  right  of  the  binary  point.   Let, 

Ad  =  2~6,  and  (2.9) 

Arp  =  2~£.  (2.10) 

The  set  of  d-values  are  therefore  integer  multiples  of  Ad  and  the 
set  of  rp  values  are  integer  multiples  of  Arp.   A  given  value  of  d  is  repre- 
sentative of  the  range  of  full  precision  divisor  values  given  by 

d  -  a  -  d-d+3,  (2.11) 

where  a  =  a '  Ad  (2.12) 

3=3'  Ad.  (2.13) 

Similarly,  rp  is  representative  of  the  range  of  full  precision  shifted  partial 
remainders  in  the  range 

rp  -  X    -   rp  -  rp  +  y ,  (2.14} 

where  X   =  A'  Arp,  and  (2.15) 

y  =  y'Arp.  (2.16) 

The  quantities  a',  3',  A',  and  y'  are  in  the  range  0  to  2  and  depend 
upon  the  truncation  procedure  and  the  form  of  representation  of  d  and  rp. 


2. k     Physical  Model  of  the  Quotient  Selection  Mechani 


sm 


We  now  turn  to  the  question  of  the  physical  realization  of  a  quo- 
tient selection  mechanism;  the  device  which  performs  the  operation  rp/d  to 
produce  the  quotient,  i,  such  that  i  belongs  to  the  set  of  quotient  digits,  I, 


Ik 

implied  by  (rp,  d).   Since  the  operation  time  of  division  relative  to  multi- 
plication is  limited  by  the  model  division,  the  requirements  for  a  high  perfor- 
mance arithmetic  processor  would  demand  the  design  of  a  high-speed  model 
division.   One  way  to  achieve  this  would  be  to  use  a  higher-speed  class  of 
logic  in  building  the  model  division  than  in  building  the  remainder  of  the 
arithmetic  processor,  but  in  this  work  we  are  assuming  one  given  class  of 
logic  and  are  constraining  the  design  problem  such  that  speed  advantages  must 
be  gained  by  organization. 

Any  valid  division  technique  is  a  candidate  for  a  model  division. 
One  aspect  of  this  study  was  a  survey  of  known  division  techniques  suitable 
for  implementation  in  a  digital  computer.   References  [lh]   through  [32]  are 
some  of  the  works  consulted.   In  evaluating  possible  candidates  we  should 
keep  in  mind  the  advantages  of  dealing  with  relatively  short  operands  coupled 
with  the  potential  requirement  for  low  operating  times. 

Digital  division  schemes  may  be  classified  as  additive,  multiplica- 
tive, tabular  or  some  combination  of  the  three.   Additive  techniques  are 
those  such  as  restoring  and  non-restoring  division  in  which  addition  and 
subtraction  are  the  fundamental  operations;  the  divisor  remains  invariant. 
Multiplicative  schemes  are  those  in  which  both  the  dividend  and  divisor  are 
multiplied  by  factors  in  such  a  manner  that  the  modified  divisor  converges  to 
1  and  thus  the  modified  dividend  converges  to  the  quotient.   Tabular  techni- 
ques are  those  based  upon  a  combinational  network:  the  quotient  digit  is 
produced  by  table-look-up.   Note  that  neither  of  the  two  later  techniques 
produces  a  remainder  but  that  a  remainder  is  not  needed  for  a  model  division. 
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We  have  eliminated  analog  schemes  and  threshold  logic  from  consid- 
eration in  this  study.   We  have  also  ruled  out  logarithmic  techniques  since, 
although  the  division  is  transformed  to  a  subtraction,  the  equipment-time 
ratio  suffers  due  to  necessity  for  forming  logs  and  antilogs. 

We  now  propose  a  generalized  structure  into  which  will  fit  multi- 
plicative and  tabular  techniques.  These  schemes  appear  to  have  a  potential 
for  higher  operating  speeds  than  the  additive  techniques.  Since  it  is  an 
additive  (non-restoring)  scheme  which  is  controlled  with  the  model  division 
it  seems  intuitively  justifiable  to  consider  a  higher  performance  class  for 
the  model.  Emphasis  on  hardwired  table  look-up  techniques  is  also  justified 
by  trends  of  technology  towards  LSI . 

Figure  2  is  the  generalized  structure.   The  parameters  and  blocks 
are  as  follows : 

Divisor  Estimate  Formation  -  This  block  accepts  a  full  precision 

< 

divisor,  d,  in  the  range  a  -  d  <  b,  and  from  it  produces  an  estimate  of  the 

divisor,  d,  with  maximum  negative  uncertainty,  a,  and  maximum  positive  uncer- 
tainty, 3.  This  box  may  also  incorporate  provisions  for  changing  the  form 
of  representation  of  d  from  that  of  d.   For  example,  if  the  model  division 
structure  accepts  only  positive  quantities,  but  d  is  in  both  negative  and 
positive  range,  this  box  could  convert  d  to  a  sign  and  magnitude  form.   The 
magnitude  would  serve  as  input  to  the  model.   The  sign  would  be  used  together 
with  the  sign  of  the  partial  remainder  in  determining  the  sign  of  the  quo- 
tient digit.   This  block  is  part  of  the  interface  between  the  full  precision 
structure  and  the  model  division. 

In  addition  to  this  interfacing  function,  the  divisor  estimate 
formation  box  also  serves  as  a  selector.   Note  that  the  output  of  Multiplier  2 
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is  coupled  back  into  this  "box.  This  feedback  loop  together  with  the  one  from 
Multiplier  1  to  the  partial  remainder  estimate  formation  box  admits  iterative 
multiplicative  schemes  into  this  structure. 

Table  1  -  This  block  accepts  d  as  the  input  and  produces  a  value, 
A,  as  a  function  of  d,  i.e.  A  =  f(d).   The  quantity  A  is  a  factor  by  which 
both  d  and  rp  are  multiplied  (the  quotient  is  therefore  not  changed).   It  may 
be  interpreted  as  a  scale  factor  used  to  transform  the  range  of  d  or  as  an 
estimate  of  the  inverse  of  d. 

Partial  Remainder  Estimate  Formation  -  This  block  accepts  a  full 
precision  shifted  partial  remainder,  rp,  and  from  it  produces  an  estimate  of 
the  shifted  partial  remainder,  rp,  with  maximum  negative  uncertainty  of  A  and 
maximum  positive  uncertainty,  y.   As  with  divisor  estimate  formation,  the 
estimate  is  in  practice  a  truncated  version  of  the  full  precision  quantity. 
The  block  may  also  incorporate  provisions  for  changing  the  form  of  the 
representation. 

In  actual  implementations  the  full  precision  remainder  may  be  the 
result  of  operations  using  an  adder- subtracter  which  produces  a  redundant 
representation.   The  estimate  of  the  remainder,  rp,  however,  is  restricted  to 
non- redundant  representations.   We  have  assumed,  although  not  rigorously 
demonstrated  the  fact ,  that  use  of  a  redundantly  represented  value  would  un- 
duly complicate  the  structure  of  the  quotient  selection  mechanism.   Merely 
determining  the  sign,  for  example,  is  of  the  same  order  of  complexity  as  con- 
verting the  value  into  a  non-redundant  form.   It  is  important  to  realize, 
however,  that  the  estimate  consists  of  only  the  high  order  digits  of  the  full 
precision  remainder.   In  practice  this  estimate  is  sufficiently  short  to 
permit  conversion  to  a  non-redundant  form  using  full-lookahead  techniques. 
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The  partial  remainder  estimate  formation  block  also  enables  the 
output  of  Multiplier  1  to  couple  back  into  the  input  side .   As  with  the 
divisor  loop,  this  path  is  necessary  for  the  inclusion  of  the  iterative  mul- 
tiplicative division  scheme. 

Multipliers  -  Multiplier  1  and  Multiplier  2  form,  respectively,  the 
quantities  P  =  A  rp  and  D  =  A  d.   The  outputs  of  both  multipliers  are  the 
inputs  to  the  second  table  look-up  structure,  Table  2.   P  may  be  thought  of 
as  a  transformed  version  of  rp.   The  maximum  negative  uncertainty  in  P  is 

AA   :  the  maximum  positive  uncertainty  is  vA    ,  where  A    =  f(b). 

max  *  *  max        max 

If  the  product,  Arp  is  truncated  so  that  non-zero  digits  are  lost,  additional 

uncertainties  A   and  y  are  introduced.   In  this  case  P  represents  trans- 

m      m 

formed  rp  values  in  the  range 

P  -  AA  -  A   -  Arp  -  P  +  Ay  +  Y  •  (2.17) 

m  m 

Similarly,  the  maximum  uncertainties  in  D  are  A    ,  3A    with  A    =  f(b). 
J  max    max      max 

If  D  is  truncated  with  maximum  truncation  errors  (a  ,  3  )  then  D  is  representa- 

m   m 

tive  of  transformed  d  values  in  the  range 

D-Aa-a-Ad-D+A3+B  (2.l8) 

m  m 

Table  2  -  This  structure  is  an  implementation  of  the  function 
which  relates  quotient  digits,  q,  to  the  products  P  and  D,  the  scaled 
remainder  and  divisor,  respectively,  for  the  model  division. 

Quotient  Re code  -  The  quotient  recode  block  represents  the  inter- 
face between  the  output  of  the  model  division  and  the  full  precision  divi- 
sion.  The  output  of  Table  2,  q,  may  require  a  recoding  into  a  form  directly 
usable  by  the  shift  gate  complex  which  selects  the  next  multiple  of  the 
divisor  to  be  used  in  forming  the  subsequent  partial  remainder. 
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At  this  point  we  narrow  the  scope  of  the  present  research  to  exclude 
iterative  multiplicative  schemes:    the  feedback  loops  of  Figure  2  will  not 
be  used.   The  remaining  structure  includes  what  might  be  considered  two 
extremes  or  boundary  cases.   In  the  one  structure,  to  be  designated  Type  1, 
Table  1  is  defined  such  that  the  rounded,  integer  portion  of  the  product  A  rp 
is  the  correct  quotient  digit  for  the  division,  rp/d.   For  a  Type  1  structure 
neither  Table  2  nor  Multiplier  2  need  be  implemented.   The  other  extreme 
occurs  when  A  =  f(d)  =  1.   In  this  case,  designated  Type  2,  Table  2  bears  the 
full  burden  of  quotient  selection  and  neither  Table  1  nor  the  multipliers 
are  required. 

But  there  are  also  intermediate,  hybrid,  structures  in  which  neither 
Table  1  nor  Table  2  is  degenerate.   In  these  structures  Table  1  and  the 
multipliers  are  used  to  transform  A  d  into  a  range  closer  to  1  than  was  d. 
The  effect  of  this  range  transformation  is  to  simplify  Table  2.   In  the  next 
chapter  we  shall  examine  the  design  of  Table  1  and  Table  2  independently  and 
then  make  some  observations  about  hybrid  structures.   The  shift  from  a  Type  1 
structure  to  a  Type  2  structure  and  accompanying  trade-off  between  speed  and 
hardware  is  but  an  example  of  the  trade-offs  available  between  sequential 
networks  and  their  combinatorial  equivalent. 


3.  DEFINITION  OF  COST  AND  PERFORMANCE 

3.1   Preliminary  Remarks 

To  this  point  in  the  thesis  we  have  defined  a  division  procedure 
which  generates  a  quotient  "by  successive  calls  to  a  lower  precision,  model 
division  unit.   A  generalized  structure  of  the  model  division  was  proposed  and 
now  we  begin  to  consider  the  synthesis  of  such  a  unit. 

Besides  the  definitive  aspects  of  this  work,  a  major  goal  is  to 
derive  useful  estimates  of  minimal  cost  and  performance  as  functions  of  the 
design  parameters  of  the  generalized  structure  in  Figure  2.   Design  parameters 
include  such  quantities  as  radix,  r;  magnitude  of  maximum  quotient  digit,  n; 
and  the  point  of  truncation  of  rp  and  d.   In  this  section,  the  important  boxes 
of  Figure  2  are  made  sufficiently  specific  to  allow  a  measure  of  minimal  cost 
and  performance  to  be  proposed. 

In  finding  a  measure  of  cost  or  performance,  the  designer  is  faced 
with  a  trade-off  between  generality  and  accuracy.   Determining  absolute  cost 
or  absolute  performance  is  very  much  dependent  upon  hardware  and  details  of 
implementation;  but  restricting  the  study  to  a  specific  class  of  logic  limits 
the  significance  of  the  work.   Questions  of  minimization  are  further  complicated 
by  controversy  as  to  what  to  minimize. 

This  work  makes  a  compromise.   Since  much  of  the  emphasis  is  on 
comparison,  a  relative  measure  of  cost  and  performance  is  adequate.   On  the 
other  hand,  some  estimate  of  absolute  cost  is  desirable.   The  higher-radix, 
table  look-up  schemes  offer  potentially  high  performance  but  require  a  larger 
number  of  gates  to  construct.  Whether,  in  fact,  they  are  at  all  feasible  for 
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a  real  machine  strongly  depends  upon  the  absolute  cost. 

3.2  Definition  of  Cost 

3.2.1  Preliminary  Remarks 

For  this  study,  the  cost  of  a  logic  network  is  defined  as  the  total 
number  of  literals  required  to  implement  the  network  in  two-level,  sum-of- 
products  (AND-OR  or  equivalent)  logic.   The  choice  ignores  fan-in  and  fan-out 
restrictions,  but  this  shortcoming  is  outweighed  by  the  following  considera- 
tions . 

1.  The  logical  definitions  of  the  networks  are  in  a 
canonical  form  which  can  serve  as  an  input  to  a  specific 
minimization  and/or  design  automation  package. 

2.  The  networks  are  realized  in  the  theoretical  mini- 
mum number  of  circuit  delays  and  thus  will  be  an  upper 
bound  on  speed  and  cost. 

3.  The  tables  for  higher-radix  structures  are  candidates 
for  LSI.   In  this  case  the  number  of  literals  is  a  measure 
of  silicon  area  required  and  power  dissipation  requirements. 

h.      A  very  efficient  computer  program  for  sum-of- products 
minimization  is  available  to  the  author. 

The  cost  of  implementing  the  structure  shown  in  Figure  2  is  the  sum 
of  the  costs  of  implementing  each  sub-block.   Symbolically, 

C  "  CDEF  +  °T1  +  CM1  +  °M2  +  <W  +  CT2  +  CR  f3-1' 

where 

C  is  the  total  cost, 


22 


C    is  the  cost  of  the  Divisor  Estimate  Formation  block, 

C  is  the  cost  of  Table  1, 

C  is  the  cost  of  Multiplier  1, 

C  is  the  cost  of  Multiplier  2, 

C     is  the  cost  of  the  Partial  Remainder  Estimate  Formation  block, 

C  is  the  cost  of  Table  2,  and 

C  is  the  cost  of  the  Quotient  Recode  block. 


At  this  point,  it  is  convenient  to  introduce  intermediate  variables, 
C    and  C    ,  and  group  the  cost  terms  as  follows: 

CTMM  =  °T1  +  CM1  +  CM2  (3*2) 

CDPQ  =  °DEF  +  CPREF  +  °R  (3>3) 

The  cost  terms  C   ,  C    ,  and  C    are  functionally  related  to  the 
design  parameters  such  as  radix,  maximum  quotient  digit,  range  of  divisor,  and 
uncertainty  in  the  estimates  of  the  divisor  and  remainders.   The  terms  C   and 
C   in  C    are  the  most  complex  and  will  be  studied  by  computer  synthesis. 
Estimates  of  C    and  the  remaining  terms  of  C    will  be  obtained  manually  as 
required.   In  most  cases,  the  term  C  p  is  dominated  by  C   +C   and  may  be 
neglected. 


3.2.2  Structure  for  Finding  Cost  of  Table  2 

Table  2  will  be  studied  as  a  multiple-output  logic  network.   It  may 
be  represented  as  shown  in  Figure  3.   The  functions,  f  through  f  are  Boolean 
functions  of  the  bit  vectors  corresponding  to  d  and  rp.   These  vectors  are 
denoted  d  and  rp,  respectively. 
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A 

rp 


dA 


TABLE    2 

MULTI  -  OUTPUT 

LOGIC 

NETWORK 

7T  "A-. 


^f0(d_,rp) 
►fi^.rp) 


*fn(d,rp) 


Figure  3.   Network  Definition  of  Table  2 


In  specifying  the  quotient  selection  criterion  (Section  2.3),  every 

pair  (d,  rp)  has  "been  associated  with  a  set,  I,  of  quotient  digits  which  the 

quotient  selection  mechanism  may  generate  when  given  inputs  (d,  rp).   The 

functions,  f  ,  f . ,  ... ,  f  must  be  found  such  that  for  every  ordered  pair, 
o   1        n 


(d,  rp)  with  allowable  quotient  digit  set,  I 


f.  (d,  rp)  =  1  for  one  and  only  one  iel, 


and 


f   (d,  rp)  =  0  for  all  other  values  of  i. 

K. 


(3.1+) 

(3.5) 


In  other  words,  every  pair  (d,  rp)  in  the  set  D  x  P  must  cause  one  and  only  one 
of  the  outputs  to  be  true,  and  this  output  must  correspond  to  a  correct  quo- 
tient digit. 

Due  to  the  overlap  of  adjacent  quotient  regions  produced  by  redun- 
dancy, many  elements  in  D  x  P  may  have  sets,  I,  containing  more  than  one 
element,  thus  many  sets  of  different  functions  are  allowable  for  given  design 


2k 


parameters.   But  our  wish  to  compare  minimal*  costs  imposes  another  constraint, 

namely,  that  the  cost  of  the  multiple  output  network  (as  defined  in  Section 

3.2.1)  is  minimal.   Symbolically  stated:   the  requirement  is  that  Cost  (f  +  f_ 

o    2 

+  f „+•••+  f  )  be  minimal . 

In  the  general  minimization  of  two-level,  AND-OR  realization  of  a 
multiple-output  network,  it  is  necessary  to  generate  the  prime  implicants  of 
each  of  the  individual  output  functions,  plus  the  prime  implicants  of  the 
functions  which  are  equal  to  all  possible  products  of  two  output  functions, 
three  output  functions,  etc.   Each  product  is  a  multiple-output  prime  implicant 
McCluskey  [33],  states  the  following  theorem  of  use  here: 

Theorem:   For  any  definition  of  networks  cost  such  that  the 
cost  does  not  increase  when  a  gate  or  gate  input  is  removed, 
there  exists  at  least  one  minimum-cost,  two-stage  network  in 
which  the  corresponding  expressions  for  the  output  functions, 
f a ,  are  all  sums  of  multiple-output  prime  implicants.   All 
the  product  terms  which  occur  only  in  the  expression  for  f  j 
are  prime  implicants  of  f a  ;  all  the  product  terms  which 
occur  in  both  the  expressions  for  f a    and  f^  but  in  no  other 
expressions  are  prime  implicants  of  f-  •  f^,  etc. 

But  in  the  present  case,  no  two  functions  are  ever  simultaneously 
true  and  thus  none  of  the  prime  implicants  of  f .  are  contained  in  any  other 

J 

function,  f ,  k  ^  j.      Thus,  by  the  theorem  stated  above,  there  exists  a  minimum 

K. 

cost  two  stage  network  which  may  be  found  by  minimizing  each  function  indepen- 
dently of  the  rest,  i.e. 

Min  Cost  (f  +  f  +  •••  +  f  )  =  Min  Cost  (f  )  +  Min  Cost 
o    1  n  o 

(f  )  +  •••  +  Min  Cost  (f  ). 


*The  term  minimal,  implies  that  we  wish  to  find  any  one  of  possibly  more  than 
one  minimum  cost  implementations. 
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3.2.3  Structure  for  Finding  Cost  of  Table  1  and  the  Multipliers 

As  with  Table  2,  Table  1  will  be  defined  as  a  multiple-output  logic 
network  as  shown  in  Figure  k.      The  input  is  d,  the  bit-vector  representation 
of  d.   The  outputs  are  the  variables  a  =  g  ,(d),  a  =  g   (d),  ...  a  =  g. 
(d),  where  g.  is  a  Boolean  function.   The  bits,  a   through  a.  comprise  the 
binary  representation  of  inverse  of  d,  A.   Unfortunately,  in  this  case,  we 
cannot  constrain  the  problem  so  that  none  of  the  outputs  are  simultaneously 
true.   For  purpose  of  estimation,  however,  it  will  be  assumed  that  the  results 
obtained  by  minimizing  each  function  independently  will  yield  an  adequate 
estimate  of  the  minimum  cost,  i.e.  C   =  Min  Cost  g  +  Min  Cost  g..  +  ...  + 
Min  Cost  g. . 


A 

d 


TABLE  1 

MULTI  -  OUTPUT 

LOGIC 

NETWORK 


-l 


-►a 


-►a 


A  =   a  .    a      *    a, 
-1     o  1 


a. 


Figure  h.      Network  Definition  of  Table  1 
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We  now  consider  the  cost  of  the  multipliers.   It  is  beyond  the  scope 
of  this  work  to  develop  a  cost-performance  analysis  for  multiplication  struc- 
tures.  The  approach  adopted  here  is  to  present  a  structure  which  experience 
has  shown  to  be  efficient  and  to  approximate  C   from  the  structure.   More 
information  about  such  a  structure  may  be  found  in  [ 8  ]. 

The  multiplier  is  illustrated  in  Figure  5.   It  consists  of  a  cascade 
of  limit  carry-borrow  adder-subtracters  together  with  shift-gates  (S.G. )  which 
form  the  necessary  multiples  of  the  multiplicand  (rp).   Shift  gate  SGO ,  in  con- 
junction with  complementing  circuits,  form  the  multiples  +1  and  +2  times;  SGI 

2i  +1 
forms  +k ,  +8  times;  and,  in  general,  SGi ,  form  multiples  of  +2      times  the 

multiplicand.   The  multiples  are  selected  by  a  recoding  of  a  through  a.. 
Appended  to  the  output  of  the  last  adder  is  hardware  which  converts  the  pro- 
ducts from  the  redundant  representation  produced  by  the  limited-carry  or 
borrow  device  to  a  non-redundant  format.   The  cost  of  Multiplier  1,  C   ,  will 
be  defined  by 

CM1  "  J°R  +  NA  NB  CA  +  <NA+1)  NB  CSG  +  HBCc  (3'6) 


where 


C^  is  the  cost  per  input  digit  of  the  recoding  logic, 

N  is  the  number  of  adders  in  the  multiplier  and  is  given  by 

.H. 

N  =  Integer  portion  of  (j  +  l)/2, 

N  is  the  number  of  bit  positions  per  adder  and  is  given  by 
B 

NB  =  e  +  log2r, 

C.  is  the  cost  of  one  position  of  an  adder, 

C   is  the  cost  of  one  position  of  a  shift  gate, 
oG 

C   is  the  cost  of  converting  one  digit  from  redundant  to  non- 

redundant  form  (assuming  the  use  of  look  ahead  techniques). 
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Figure   5-      Structure   of  Multipliers 
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The  quantity,  j,  is  the  index  of  the  low-order  bit  of  A,  the  approxi- 
mation of  d   (A  =  a  .  £La....a.),  e  is  the  number  of  bits  to  the  right  of  the 

o    12    j 

radix  point  in  rp,  and  r  is  the  radix  of  the  model  division.  As  the  need 
arises,  estimates  of  minimum  values  of  C  ,  C  ,  C   ,  and  C  may  be  obtained. 
The  cost  of  M  ,  C   ,  is  given  by  Equation  3. 6  with  e  replaced  by  (e  +  log  r) 
replaced  by  6,  the  number  of  bits  in  d. 


3.3  Definition  of  Performance 

3.3.1  Performance  of  the  Model  Division 

Performance  will  be  measured  in  units  of  number  of  bits  of  quotient 

generated  per  gate  delay.  For  practical  cases,  the  number  of  bits  of  quotient 

generated  by  the  model  division  is  log_  r.   Since  the  divisor  is  constant  for 

a  given  division  operation,  the  operating  time  of  the  model  division  is  limited 

by  the  paths  driven  by  the  remainder.   The  time,  T  ,  in  gate  delays,  required 


to  produce  a  quotient  digit,  radix  r,  is  given  by 


where 


T=T     +T   +T   +T 
Q    PREF    Ml    T2    R 


T     is  the  number  of  logic  delays  required  in  forming  the 

estimate  of  the  remainder, 
T   is  the  number  of  logic  delays  required  to  form  A  rp  in 

Multiplier  1, 
T   is  the  number  of  logic  delays  to  select  a  quotient  digit 

in  Table  2,  and 


(3.7) 


T  is  the  number  of  logic  delays  to  recode  the  output  of  T2. 
K 
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Performance  of  the  quotient  selector,  P  ,  is  therefore  given  by 

p    ^  (3.8) 

Q  "      m 

Q 


3.3.2  Performance  of  the  Full  Precision  Division 

The  measure  of  primary  interest  is  the  performance  of  the  full 
precision  division.   We  shall  assume  a  full  precision  multiplication  structure 
similar  to  that  shown  in  Figure  5.   It  consists  of  a  cascade  of  K  adder  sub- 
tracters each  of  which  is  capable  of  retiring  K1  bits  of  the  multiplier.   The 

kk' 
effective  radix  for  multiplication  is  therefore  r„,  =  2 

Let , 

M  be  the  quotient  length  in  bits, 

T  be  the  number  of  logic  delays  required  for  the  iterative 

steps  of  division, 
T  be  the  number  of  logic  delays  required  to  add  two  full 

precision  numbers, 
T  be  the  number  of  logic  delays  required  for  control  after 

the  quotient  bits  have  been  generated  by  the  quotient 

selector,  and 
N  be  the  number  of  calls  to  the  quotient  selector. 

Then, 

T  =  M  T  +N   (T  +T)  (3.9) 

D   ^,   A    Q  v  Q    C;  vo  y' 

where,  if  r  is  the  radix  of  the  model  division, 

N   =     M 
Q    log  r  (3.10) 
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For  this   study,   K'    =   2,   thus 


T  .  M  K  +  !^im  (3.n) 

D       1 2       los.r  / 


The  performance  of  the  full  precision  division  is  defined  by 

TD   \TAlog2r  +  2(TQ+Tc)j  (3-12> 


k.      ALGORITHMS  FOR  SYNTHESIS  AND  ANALYSIS 

U.l  Preliminary  Remarks 

The  derivation  of  cost  and  performance  functions  by  a  direct, 
analytic  approach  is  complicated  "by  the  discrete  nature  of  these  functions  and 
by  the  large  number  of  variables.   An  empirical,  constructive  approach  was 
therefore  adopted.   The  first  phase  of  the  experiment  (the  topic  of  this 
section)  required  the  formulation  of  a  systematic  approach  to  the  synthesis  of 
a  minimal  cost,  mathematically  accurate,  quotient  selection  mechanism  for  a 
given  set  of  design  parameter  values.   Although  the  synthesis  routines  in 
themselves  would  be  of  use  in  designing  a  quotient  selection  mechanism,  in 
this  study  they  are  used  as  tools  in  studying  the  cost  and  performance 
functions.   We  are  performing  analysis  by  means  of  computer-aided  synthesis. 

In  the  second  phase  of  the  experiment,  the  programs  developed  in  the 
first  phase  were  run  with  various  combinations  of  parameter  values  in  order  to 

estimate  cost  and  performance.   The  results  of  each  run  might  be  thought 
of  as  determining  a  point  on  a  cost  versus  performance  curve.   The  hope  is 
that  only  a  few  runs,  relative  to  all  possible  parameter  combinations,  would 
be  necessary  in  order  to  find  approximations  which  would  be  useful  for  inter- 
polation and  extrapolation. 

But  this  empirical  approach  is  not  without  major  practical  prob- 
lems.  There  are  a  huge  number  of  possibilities  for  parameter  values,  and  the 
minimization  problems  are  very  large  and  demanding  of  computer  time.   These 
problems  were  mitigated  by  restricting  the  values  of  parameters  to  those  of 
practical  importance  and  by  concentrating  on  the  effects  of  dominant  parameters, 
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As  discussed  in  Section  3,  the  dominant  cost  term  for  a  Type  2 
structure  is  C^,   the  cost  of  Table  2.  For  a  Type  1  structure,  although  the 
cost  of  Table  1  (C™)  may  not  dominate  the  cost  of  the  multiplier,  it  is  the 
least  studied  term.   The  following  sub-sections  comprise  a  description  of 
algorithms  which  generate  logic  equations  which  define  Table  1  and  Table  2 
for  given  values  of  design  parameters.   The  algorithms  do  not  produce  a  defi- 
nition of  the  other  blocks  of  Figure  2,  but  do  place  some  constraints  upon 
their  structure. 

h.2     Deriving  a  Minimal  Cost  Design  for  Table  2 

Conceptually,  Table  2  in  Figure  2  is  a  direct  implementation  of  a 
P-D  plot.   To  implement  a  given  P-D  plot,  a  relation  must  be  defined  from  the 
set  D  x  P  to  a  subset  of  D  x  P,  D  x  P,  such  that  each  element  of  D  x  P  maps 
into  an  element  of  D  x  P  and  with  error  bounds  for  each  element  (d,  rp)  such 
that  the  quotient  selection  criterion  is  satisfied.   Note  that  we  have  not 
required  that  the  relation  be  a  function,  since,  due  to  redundant  representa- 

A        A 

tion,  the  same  rp-value  or  d-value  may  map  into  different  rp  or  d  values; 
uniqueness  is  not  guaranteed.   For  practical  reasons  the  relation  is  restricted 
to  those  which  may  be  defined  by  the  successive  operations  of  truncation  and 
assimilation  (conversion  to  a  non-redundant  form).   Even  within  this  restriction, 
however,  there  are  many  possible  alternatives.   The  maximum  amount  of  trunca- 
tion  error  which  may  be  tolerated  for  a  given  pair  (d,  rp)  depends  upon  the 
location  of  the  point.   There  is  also  trade-off  between  e  and  6,  the  points  of 
truncation  of  rp  and  d,  respectively. 

The  following  is  a  list  of  the  steps  in  the  process  of  deriving  a 
minimal  cost  design  for  Table  2. 
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1.  Set  the  values  for  design  parameters: 
n,  r,  a,  b,  a',  8',  Y1 »  ^'»  e>  6-* 

2.  Run  the  program  QS3  (described  in  Section  U.2.1)  to  produce 

a  sum-of -products  (minterm)  definition  of  each  output  function 
of  Table  2. 

3.  Run  the  program,  PI,  with  each  set  of  minterms  produced  by 
QS3  as  input.   The  program  PI  finds  all  prime  implicants  of 
the  functions,  identifies  the  essential  prime  implicants,  and 
generates  the  constraints  which  must  be  satisfied  in  order  to 
cover  the  function. 

h.      Run  an  Integer  Linear  Programming  routine  to  find  a  minimal 
cost  set  of  prime  implicants  which  satisfy  the  constraints 
produced  in  step  3.   The  cost  of  a  prime  implicant  is  the 
number  of  literals.   The  combination  of  the  prime  implicants 
selected  in  this  step,  together  with  the  essential  prime 
implicants  identified  in  step  3,  define  the  Boolean  function. 

5.   Tabulate  the  total  number  of  literals  required  to  define 
each  output  functions.   The  total  of  these  values  will  be 
taken  as  the  cost  of  implementing  Table  2. 

U.2.1  Defining  the  Output  Functions 

As  described  in  Section  3.2.2,  Table  2  is  treated  as  a  multiple  out- 
put network.   This  section  describes  an  algorithm  for  specifying  these 


'Initially,  Table  2  is  studied  apart  from  Tl,  Ml,  and  M2.   A  =  P  (d)  =  1. 
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functions  as  sums-of-products  of  minterms,   The  minterms  are  formed  by  con- 
catenating bit  vectors,  rp,  with  bit  vectors,  d.  A  Fortran  program  called  QS3 
(Quotient  Selection  Program  3)  was  written  to  accept  design  parameters  and  to 
produce  the  minterm  definitions  of  each  of  the  output  functions,  f   (rp,  d), 
...,  fn  (rp,  d). 

The  derivation  will  be  restricted  to  the  first  quadrant  (positive 
rp  and  d)  of  the  P-D  plot.   The  full  P-D  plot  is  symmetric  about  both  axes  and 
thus  the  cost  of  implementing  one  quadrant  is  a  good  estimate  of  the  cost  of 
implementing  any  other. 

Figure  6  illustrates  a  portion  of  the  first  quadrant  of  a  P-D  plot. 
Three  adjacent  quotient  regions,  q  (i+l),  q(i),  and  q  (i-l)  are  designated 
together  with  the  hori zonal  line,  rp  =  rp  =  mArp.   Every  line  of  this  form  will 
be  designated  an  "rp-line".   The  quantity,  m,  is  an  integer,  and  Arp  =  2 
The  task  of  defining  the  output  functions  for  Table  2  may  be  reduced  to  that  of 
assigning  adjacent  sections  of  every  rp-line  to  one  and  only  one  q-region.   For 
example,  the  segment  of  the  rp-line  between  d  =  a  and  d  =  b  must  be  subdivided 
into  three  segments:   one  in  each  q-region  shown.   The  dividing  line  between 
adjacent  line  segments  assigned  to  q(i)  and  q  (i+l)  will  be  called  the 
"divisor  transition  value  between  q(i)  and  q(i+l)."  A  divisor  transition  value 
between  q(i)  and  q(i+l)  may  be  picked  from  a  sub-range  of  the  divisor  between 
the  intersections  of  the  rp-line  and  the  boundaries  of  the  overlap  region. 
The  range  in  which  the  divisor  transition  value  may  be  chosen  is  determined 
as  follows . 
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rp  =  rp=mArp 


UPPER  BOUND  OF  q(i) 

LOWER  BOUND  OF  q(i  +  l) 

UPPER  BOUND  OF  q(i-l) 

•LOWER  BOUND  OF  q(  i ) 


•>  d 


d  =  a 


d=b 


Figure  6.   Portion  of  P-D  Plot  Illustrating  Segmentation  of  rp-line 


Let  d  be  the  divisor  transition  value  for  rp  =  rp,  between  q(i)  and 
q(i-l).   Then  the  ordered  pair  (rp,  d  )  will  be  representative  of  all  (rp,  d) 
in  the  rectangle  shown  in  Figure  7-   Since  d  is  a  transition  value,  (d ,  rp) 
implies  a  quotient  digit  of  i-1  and  (d  -  Ad,  rp)  implies  a  quotient  digit  of  i. 

X* 

The  rectangle  corresponding  to  (d  ,  rp)  must  be  completely  within  the 
q(i-l)  region.   The  strictest  bound  is  therefore  at  the  upper,  lefthand  corner 
of  the  rectangle  in  Figure  7,  and  thus  the  following  must  hold. 

(U.l) 

rp  +  y  -   (i  -  1  +  p)   (dt  -  a) 
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A 

rp 


UPPER    BOUND   OF    q,(i-t) 

rp  =  (i-l+/>)d 


LOWER    BOUND    OF    q.(i) 

rp  =  (i-/>)d 


Figure  "J.      Portion  a  P-D  Plot  Illustrating  Constraints  in 
Finding  Divisor  Transition  Interval 


Similarly,  the  rectangle  corresponding  to  (d  -  Ad,  rp)  must  be  com- 
pletely  within  the  q(i)  region.   The  strictest  bound  in  this  case  is  at  the 
lover,  righthand  corner  of  the  rectangle  where  the  following  must  hold. 


rp  -  A  *  (i-p)  (dt-Ad+3) 


(U.2) 


In  practical  cases,  to  insure  that  all  d  values  map  into  at  least 
one  d  value,  Ad  =  3  and  thus  (^.2)  becomes 


rp 


-  A  *  (i-p)da 


(U.3) 


Combining  (U.l)  and  (^.3)  yields  a  range  restriction  on  d  ,  namely, 
(rp  +  Y)/(i-l+p)  +  ot  ^  dt  ^   (rp  -  X)/(i-p) 


(k.k) 


Note  that  the  strategy  is  to  select  the  size  of  the  rp-steps,  Arp, 
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and  to  allow  the  algorithm  to  find  the  maximum  size  steps  allowable  for  d. 
Theoretically,  the  program  could  be  designed  such  that  Ad  would  be  specified 
and  the  precision  requirements  for  the  partial  remainder  would  be  determined. 
The  former  approach  is  taken  due  to  the  fact  that  control  of  Arp  is  more 
critical.  The  precision  of  the  estimate  of  the  partial  remainder  (the  number 
of  bits)  should  be  kept  low  in  order  to  keep  down  the  time  required  to  convert 
from  a  redundant  to  a  non-redundant  form.   The  logic  paths  involving  rp  as 
opposed  to  those  involving  d,  are  changing  with  each  call  to  the  model 
division.   For  this  reason  there  is  motivation  to  simplify  the  logic  involving 
only  rp  at  the  expense  of  complicating  the  logic  involving  only  d.   It  should 
also  be  realized  that  the  precision  requirements  on  the  estimate  of  the  par- 
tial remainder  are  based  upon  worst  case  calculations.   Although  QS3  uses 
this  worst  case  precision  uniformly  in  generating  the  division  precision 
requirements,  the  minimization  routines  will  remove  unneeded  precision. 

The  quantity,  d  ,  may  be  any  value  in  the  range  defined  in  Equation 
k.h.      Since  the  design  goal  is  to  minimize  the  total  number  of  literals 
required  to  implement  the  table,  d  is  picked  to  be  a  number  which  can  be 
represented  with  the  fewest  bits.   In  other  words,  if  all  numbers  in  the  range 

specified  by  (U.U)  are  represented  as  the  ratio  of  two  integers  in  the  form 

M 
N/2  ,  the  d  selected  is  one  satisfying  (U.*0  and  with  the  minimum  value  of  M. 

Using  the  algorithm  of  selecting  the  simpliest  binary  number  in  the 

allowable  divisor  transition  ranges,  the  rp-line  in  Figure  6  is  divided  into 

three  segments ,  as  follows : 

Segment  Assigned  to 

a  ^  d  <  d  q(i+l) 

dtl"d  <dt2  *(1) 

dt2  ^  d  <-   b  q(i-l) 


where 
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The  segments  are  next  defined  as  minterms  and  the  minterms  are  assigned  to  the 
appropriate  output  function,  f .   ,  f .  ,  f .   ,  etc. 

The  complete  family  of  rp-lines  is  produced  by  stepping  along  the 
rp-axis  (beginning  at  o)  in  increments  of  Arp.   By  segmenting  the  rp-lines  at 
the  boundaries  of  the  P-D  plot  and  the  divisor  transition  values,  each  quotient 
region,  q(i)  for  i=0  through  n,  is  defined  by  a  set  of  triplets  of  the  form 

( d,   ,  d    ,  mArr* ) 
v  l,m'   r,m'    *' 

d.,    is  the  left  end  of  the  segment  of  the  mth  rp-line 

l,m  r 

in  q(i); 

d    is  the  right  end  of  the  segment  of  the  mth  rp-line 

r  ,m  e 

in  q(i ) ;  and 
m*Arp  defines  the  values  of  rp. 

Rather  than  being  stored  as  triplets,  each  segment  is  stored  as  a  set  of  min- 
terms . 

Given  the  ordered  pair,  (d,  rp),  the  minterm  equivalent  is  rp    d 
where   |  denotes  bit  string  concatenation.   The  minterm  may  be  represented  as 
a  bit  string  or  as  decimal  integer  equivalent  of  rp  I   d,  treated  as  a  binary 
integer.   Each  triplet,  (dn   ,  d    ,  mArp)  is  transformed  into  a  pair  of 
minterms,  (MINTRM  , MINTRM  ).   Under  this  transformation,  each  segment  of  the 
rp-line  is  logically  defined  by  MINTRM  v   (MINTRM  +  l)  v  ...  v  MINTRM^ 
The  triplets  are  converted  to  minterms  as  follows. 

The  quantities  d,    and  d    are  all  divisor  transition  values  and 

^  1  ,m      r ,m 

are  therefore  of  the  form  N/2  .   For  a  given  q(i)  region,  find  the  largest  6, 

6    ,  required  to  represent  dn    or  d   .   Then  2      is  the  maximum  precision 
max  e  l,m     r,m 
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required  to  represent  d.   Given  dn    =  N   /  Dn  ;  d    =  N  /D   ("both  fractions 

l,m    1    1    r,m    r  r 

in  reduced  form)  rp  =  mArp ,  and  NBDL  =  the  number  of  hits  of  the  divisor  to 
the  right  of  the  radix  point,  then 

MINTRM  =  m2(6max  +  NBDL)  +  (26majc  N  )/D  (4.5) 

MINTOM  =  (m2(6max  +  NBDL)  +  (26ma*  N  )/D  )  -  1.         (U.6) 
r  r   r 

A  useful  estimate  of  the  number  of  minterms  required  to  define  a 

given  q  (i)  region  may  be  derived.   The  QS3  algorithm  will  actually  select  the 

upper  and  lower  boundary  of  each  q  (i)  region  which  will  be  a  stairstep  in  the 

transition  region  between  q  (i),  q  (i  +  l)  and  q  (i  -  l).   For  purposes  of 

this  estimate,  assume  that  the  dividing  line  between  q  (i)  and  q  (i  +  l)  is 

the  average  value  between  the  upper  boundary  of  q  (i)  and  the  lower  boundary 

of  q  (i  +  l).   The  boundary  between  q  (i)  and  q  (i  +  l)  is  thus  defined  by 

rp  =  (i  +  1/2)  d.   The  area  of  each  q  (i)  region  will  be  defined  as  the  area 

between  the  lines  d=a,  d=b,rp=(i+  1/2)  d,  and  rp  =  (i  -  1/2)  d.   Thus, 

\  ?    ? 

Area  (q  (i)  )  =  j     x  dx  =   (b  -  a  )/2.  (k.-j) 

a 

The  area  is  independent  of  the  value  of  the  quotient  digit.   Let  e 

be  the  number  of  bits  to  the  right  of  the  radix  point  in  rp  (Arp  =  2    )  and 

■A 

6  be  the  number  of  bits  to  the  right  of  the  radix  point  in  d.   Note  that  the 
minimum  value  of  6  may  increase  with  i .   If  the  worst  case  value  of  6  is 
applied  uniformly  in  defining  all  quotient  regions,  the  bits  of  excess  pre- 
cision will  become  don't  care  literals  in  the  course  of  minimization.   To 
reduce  the  minimization  problem,  6  may  be  treated  as  a  function  of  i  by 

A 

defining  6  (i)  as  the  minimum  number  of  bits  required  in  d  in  order  to 
correctly  define  the  q  (i)  region  for  the  given  value  of  e.   The  number  of 
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minterms  for  each  q  (i)  region,  M  (i),  is  thus  given  "by 

M  (i)  =  (b2-a2)  2  <e+6(l)  "  l5.  (U.8) 

Figure  8  is  an  annotated  flowchart  of  the  program  (QS3)  which 
actually  produces  the  definition  of  the  output  functions  for  Table  2.  The 
following  assumptions  and  conventions  should  be  noted: 

1.  The  program  was  written  in  Fortran  and  thus  Fortran 
notation  and  variable  names  are  used  in  the  flowchart. 

2.  In  most  cases,  the  Fortran  variable  names  differ 
from  that  used  in  Section  2.   Included  in  the  comments 
are  statements  which  related  the  Fortran  name  to  that 
used  in  the  derivations.   For  example,  DLEFT  =  a  (2.l). 
The  number  in  parentheses  is  the  section  number  in 
which  "a"  is  defined. 

3.  The  divisor  is  restricted  to  positive  values  in  a 
non-redundant  representation  and  thus  a  =  0   in 
Equation  h.k. 

k.      Single  circles  on  the  flowchart  denote  entrances; 
double  concentric  circles  denote  exits. 
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READ  DLNUM, 
DLDENO,  DRNUM, 
DRDENO 


DLEFT  =  DLNUM/DLDENO 
DRIGHT  =  DRNUM/DRDENO 

l 

' 

READ  ERR  RP  P, 
ERR  RP  N 

READ  N,  R 


The  endpoints  of  the  divisor 
interval  are  read  in  a  fractional 
form.   DLNUM  and  DLDENO  are  the 
numerator  and  denominator, 
respectively,  of  the  left  end. 
DRNUM  and  DRDENO  are  the 
numerator  and  denominator, 
respectively,  of  the  right  end. 

DLEFT  5_a  (2.1) 
DRIGHT  E  b  (2.1) 


ERR  RP  P  is  the  maximum  positive 
truncation  error  in  rp;  ERR  RP  N 
is  the  maximum  negative  truncation 
error  in  rp. 

ERR  RP  P  e  Y  (2.3) 

ERR  RP  N  =  X  (2.3) 

N  is  the  maximum  allowable 
quotient  digit.   R  is  the  radix. 

N  =   n  (2.1) 

R  =  r  (2.1) 

NR  £  p  (2.1) 
Note:   NR  is  REAL 


Figure  8.   Flowchart  of  QS3  Program 
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DELRP  =  l./DENOM 


FJ  =  J  -  1 
RP  =  FJ/DENOM 
RPU  =  RP  +  ERR  RP  P 
RPL  =  RP  -  ERR  RP  N 
JMl  =  J  -  1 

\ 

1 

IZCK  =  0 
IWHICH  =  1 

DELRP  is  the  increment  between 
successive  values  of  rp.   DENOM 
is  defined  by  an  assignment  state- 
ment prior  to  this  step. 

DELRP  =  Arp 


JMAX  is  the  upper  limit  on  the 
index  use  to  step  along  the 
rp-axis. 


This  is  the  beginning  of  the  outer 
loop  which  steps  along  the  rp-axis. 


Compute  the  present  value  of  RP 
to  be  used  as  rp  and  also  the 
upper  (RPU)  and  lower  (RPL) 
bounds  of  the  rp  values 
represented  by  rp. 


Initialize  two  control  variables. 
If  IZCK  remains  at  0  through  the 
inner-loop,  which  varies  the 
quotient  digit,  then  no  divisor 
transition  intervals  occur  between 
(a,b).   IWHICH  =  1  indicates  that 
we  are  looking  for  the  first 
divisor  transition  interval  for 
the  present  value  of  rp.   In  this 
case,  a  =DLEFT,  will  be  used  as 
the  left  end  of  the  segment. 


Figure  8  (continued).  Flowchart  of  QS3  Program 
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QI   =    (RPU/DLEFT)    +   1     -  NR 

I  ■  QI  +  1 

ID    (I.GT.N)    I   =  N 
QI   =   I 

1       lift        1           h. 

' 

DUL  =  RPU/(QI   -   1  +  NR) 

QI,  the  quotient  digit  value,  is 
initialized  at  the  greatest  value 
such  that  the  part  of  the  line 
segment  formed  by  RP  +  FJ/DENOM 
and  the  end  points  of  the  divisor 
interval,  (a,"b),  is  in  the 
Ql-region. 


DUL  is  the  left  endpoint  of  the 
divisor  transition  interval 
between  QI  and  QI  -  1. 


This  tests  whether  or  not  the 
transition  interval  is  to  the 
left  of  the  left  boundary  of  the 
P-D  plot.   If  so,  QI  is  decremented, 


A  divisor  transition  interval  within 
(a,b)  has  been  found. 


This  tests  whether  or  not  the 
divisor  transition  interval  is  to  the 
right  boundary  of  the  P-D 
plot.   If  so,  continue  with  new 
RP- value . 


Figure  8  (continued).  Flowchart  of  QS3  Program 
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DUR  =  RPL/(QI-NR) 


CALL  DT  (DUL,  DUR,  NN,  MM) 


CALL  MINTAL  (IWHICH,  NN,  MM, 
J-l,  QI) 


IWHICH  =  0 


DUR  is  the  right  endpoint  of 
the  divisor  transition  interval 
between  QI  and  QI  -  1. 


Subroutine  DT  selects  the 
divisor  transition  value  between 
DUL  and  DUR.   The  value  selected 
is  returned  in  a  fractional  form 
(NN/MM).   MM  =  2111,  where  m  is  the 
smallest  integer  such  that 
DUL   <   NN/MM  <   DUR. 


Subroutine  MINTAL  creates  the 
minterm  definition  of  the  rp-line 
segments.   If  IWHICH  =  1,  then 
DLNUM/DLDENO  is  the  left  end  of 
the  segment  and  NN/MM  is  the 
right  end.   If  IWHICH  =  0, 
then  the  value  of  NN/MM  on  the 
previous  call  to  MINTAL  is  the 
left  end  and  the  present  NN/MM 
is  the  right  end.   J-l  denotes 
the  rp-line  and  QI,  the  quotient 
region. 


Set   IWHICH  =  0. 


Figure  8  (continued).   Flowchart  of  QS3  Program 
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TO 


NN  =  DRNUM 
MM  =  DRDENO 


I 


CALL  MINTAL  (rWHTCH,NN,MM, 

J-1,01) 


Decrement  QI 


Check  whether  or  not  all 
Ql-regions  have  been  accounted 
for. 


Use  DRNUM/DRDENO  as  the  right  end 
of  the  last  rp-line  segment  for 
the  present. 


End  of  DO-Loop  which  increments  rp. 


Figure  8  (continued).   Flowchart  of  QS3  Program 
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4.2.2  Minimizing  the  Output  Functions 

A  discussion  of  the  minimization  of  two  level  switching  circuits  is 
beyond  the  scope  of  this  thesis.   However,  this  section  sketches  the  approach 
used  in  this  work  and  references  a  detailed  description  of  the  algorithms. 
These  algorithms  are  noteworthy  due  to  the  fact  that  they  will  minimize 
functions  of  many  variables  involving  many  minterms.   In  the  present  work 
they  have  been  used  to  minimize  functions  of  19  variables  with  over  3100 
minterms. 

The  program  QS3  generates  a  sum-of -products  (each  product  is  a  min- 
term)  definition  of  each  output  function.   For  each  function,  the  remaining 
tasks  are:  l)  to  obtain  all  the  prime  implicants  of  the  function;  and  2)  to 
select  a  minimal  cover  which  consists  of  some  subset  of  all  prime  implicants. 

The  program  used  to  accomplish  step  1  was  recently  developed  by 
V.  G.  Tareski  [3^  ] .   It  is  an  extension  of  an  algorithm  developed  by  Carroll 
et.  al.  [35]  in  late  1968.   Tareski  has  coded  his  improved  version  of  the 
algorithm  in  both  PL/1  and  Fortran  IV  on  the  IBM  360/75- 

The  output  from  the  program  (PI  for  Prime  Implicant)  is  a  list  of 
prime  implicants,  each  in  the  form: 

TTTTTT,  where  T  is 

1  if  the  corresponding  variable  appears  in  the  true  form; 
0  if  the  corresponding  variable  appears  in  the  complement 

form  •  and 
X  if  the  corresponding  variable  is  not  present. 

Each  prime  implicant  is  assigned  an  identification  number.   The  PI  program 
also  partially  solves  the  covering  problem  in  that  it  identifies  all  essential 
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prime  impli cants.   A  prime  implicant  is  essential  (must  be  selected  for  the 
covering)  if  it  covers  a  cell  in  the  n-cube  representation  of  the  function 
which  is  not  covered  by  any  other  prime  implicant. 

The  program  generates  a  set  of  constraint  equations  which  must  be 
simultaneously  satisfied  to  guarantee  covering.   The  constraints  are 
specified  by  a  set  of  equations,  each  of  which  is  a  Boolean  sum  of  prime  im- 
plicant identification  numbers.   The  identification  number  is  "true"  if  the 
prime  implicant  is  selected;  false  otherwise.   For  example,  two  such  equations 
might  be 

2  v  5  v  7  =  '1' 

5  v  9  v  11  =  »1' 

The  set  of  constraint  equations  pose  a  covering  problem,  i.e.  the 
problem  of  finding  a  set  of  prime  implicants  which  satisfy  every  equation. 
The  problem  is  further  constrained  by  the  requirement  that  the  sum  of  the 
literals  of  the  selected  prime  implicants  be  minimal.   Fortunately,  Liu  [36  ] 
and  Ibaraki  et.  al.  [37  ]  recently  developed  a  very  efficient  algorithm  and 
computer  program  which  will  solve  this  problem.   The  program  accepts  the 
constraint  equations  together  with  the  number  of  the  literals  in  each  prime 
implicant,  and  produces  a  minimal  cost  covering.   These  prime  implicants 
together  with  the  essential  prime  implicants  found  by  the  PI  program  con- 
stitute the  total  function.  An  example  is  given  in  Appendix  B. 

It  must  be  noted  that  the  minimization  program  is  not  making  explicit 
use  of  "don't  care"  minterms.   If  e'  is  the  total  number  of  bits  in  rp  and  6' 

is  the  total  number  of  bits  in  d,  then  the  total  number  of  minterms  which  can 

6 '  +  e' 
be  formed  by  concatenating  rp  and  d  is  2       .   Many  of  these  minterms  may 

not  correspond  to  area  within  the  range  of  the  P-D  plot  and  therefore  are  don't 
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cares  in  the  sense  that  they  may  be  arbitrarily  added  to  or  deleted  from  a 
function  depending  upon  which  yields  the  simplest  function.   In  the  cases 
actually  designed,  the  number  of  don't  cares  far  exceeds  the  number  of  true 
minterms.   For  example,  with  a  divisor  in  the  range  1/2  to  1,  the  number  of 
minterms  required  to  define  a  P-D  plot  with  p  =  2/3  and  a  uniform  grid  of 
2   x  2   is  .25  r  2    ,   and  the  number  of  don't  care  minterms  is 
•75  r  2    .   Since  in  cases  studied  6+e  may  be  as  great  as  lU,  the  don't  care 
minterms  would  severely  tax  the  minimization  routines.   They  have,  therefore, 
not  been  included  explicitly.   The  potential  effect  of  the  don't  cares  can  be 
approximated  in  specific  cases  considering  the  following  observations; 

1.  For  d  in  the  range  1/2  -  d  ^  1,  the  don't  care 
minterms  corresponding  to  area  of  the  P-D  plot  to  the 
left  of  d  =  1/2  would  eliminate  the  d  bit  of  weight 
1/2  from  all  output  functions  of  Table  2.   The  cost 
in  literals,  therefore,  reduced  by  the  number  of 
prime  impli cants . 

2.  If  the  don't  care  minterms  above  the  upper 
boundary  of  the  q  (n)  region  are  combined  with  the 
true  minterms  defining  q  (n),  the  output  function 
for  q  (n)  is  greatly  minimized.   The  cost  of  q  (n) 
will,  therefore,  be  neglected  in  estimating  the  total 
cost  of  Table  2. 

3.  If  the  don't  care  minterms  above  the  upper 
boundary  of  q  (n)  region  are  combined  with  the  true 
minterms  defining  q  (i),  i  ^   n,  then  some  literals 
may  drop  out  of  the  bit  string  corresponding  to  the 
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integer  portion  of  rp,  "but  none  are  removed  from  the 
fractional  part  of  rp  or  d.   This  reduction  may  be 
approximated  by  studying  the  problem  of  minimizing  a 
decoder  of  the  integers  0  through  n,  each  of  bit 
length,  log  r.   The  minterms  n  +  1  through  r  -  1 
should  be  treated  as  don't  cares.   It  has  been  estimated 
that  this  effect  will  reduce  the  total  cost  of  Table  2 
by  about  15$. 

^.3  Deriving  a  Minimal  Cost  Design  for  Table  1 

This  section  describes  the  algorithms  used  to  synthesize  a  design 
for  Table  1  of  a  Type  1  structure.   The  approach  can  yield  only  an  estimate 
of  minimal  cost  since  the  minimization  algorithm  is  applied  to  each  output 
function  independent  of  the  others.   Furthermore  it  has  not  been  demonstrated 
that  the  algorithm  used  to  define  the  output  function  necessarily  produces  a 
minimal  cost  design.  Despite  these  shortcomings,  the  algorithms  appear  to 
produce  sufficiently  accurate  results  for  purposes  of  cost  comparison  and  for 
studying  trade-offs  between  the  cost  of  Table  1  and  Table  2. 

The  following  is  a  list  of  the  steps  in  the  process  of  generating 
Table  1  and  evaluating  the  cost: 

1.  Set  the  values  for  design  parameters  =  n,  r,  a,  b,  a,  (3,  Y»  ^ - 

2.  Run  the  program  QSU  (described  in  Section  U.3.1)  to  produce  a 
sum- of -products  (minterm)  definition  of  each  output  function  of 
Table  1. 

3.  Run  the  program  PI  (Section  U.2.2)  with  each  set  of  minterms 
produced  by  QSU  as  input. 
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h.      Run  an  Integer  Linear  Programming  routine  to  find  a  minimal  cost 
set  of  prime  implicants  which  satisfy  the  constraints  produced 
in  step  3. 

5.   Tabulate  the  total  number  of  literals  required  to  define  each 

output  function.   The  total  of  these  values  will  be  taken  as  the 
cost  of  implementing  Table  1. 


U.3.1  Defining  the  Output  Functions 

Generating  a  quotient  digit  using  a  Type  1  structure  is  accomplished 


as  follows 


1.  Given  d,  form  an  estimate  of  d,  d,  and  from  d  form  an  estimate 
of  1/d,  A. 

2.  Form  y  =  rp  •  A  +  1/2. 

3.  Take  the  integer  portion  of  y  as  the  quotient  digit,  i.  e. 
q  =  I  (y). 

The  algorithm  consists  of  two  steps: 

1 .  For  a  given  Arp ,  y ,    A  ,  n,  r,  a,  3 ,  find  a  D  such  that  the 
selection  critereon  is  satisfied  everywhere  on  the  P-D,  plot. 

2.  The  d-values  are  of  the  form  j  Ad,  where  j  is  an  integer.   Each 
d  represents  a  divisor  interval  d  to  d  +  Ad.   For  every  d,  we 
must  find  a  value  of  the  function  A  (d)  such  that  if  (d,  rp) 
implies  q  =  i,  then  I  (A(d)  rp  +  1/2)  =  i. 
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The  strictest  bounds  occur  in  the  vicinity  of  the  transitions 
between  adjacent  quotient  regions.   For  a  given  d  consider  rp  lines  in  the 
vicinity  of  the  intersection  of  d  and  the  upper  boundary  of  q  (i-l)  and  lower 
boundary  of  q  (i).   See  Figure  9. 


rp  =  (t-l  +  />)d 


rp  =  (i  -/o)d 


Figure  9.   Portion  of  P-D  Plot  Illustrating  Constraints  in  Finding  A(d) 
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Each  rp-line  has  a  division  transition  range  between  i  and  i-1  with 
left  end  given  by 

d1  (rp)  =  (  (rp+Y)/(i-l-p)  )  +  a  (4.9) 

and  right  end  given  by 

dr  (rp)  =  (rp-  A)/(i-p)  (4.10) 

This  derivation  is  given  in  Section  4.2,1. 

If  d  *  d±    (rp)  (U.ll) 

then  a  quotient  digit  of  i  must  be  selected  and  thus  a  value  of  A(d) 
must  be  found  such  that  (i-l/2)/rp  -  A(d)  <  (i+l/2)/rp.   Similarly,  if 

d  +  Ad  >  d   (rp)  (4.13) 

then  a  quotient  digit  of  i-1  must  be  selected  and  thus  an  estimate  must 
be  found  such  that 

(i-3/2)/rp  ^  A(d)  *  (i-l/2)/rP  (k.lk) 

For  a  given  value  of  i  and  d,  find  the  minimum  value  of  rp  such  that 
Equation  4.11  is  true.  Denote  this  quantity  rp    .   Also  find  the  maximum  value 
of  rp  such  that  Equation  4.13  is  true.   Denote  this  value  rp   . 

Substituting  these  quantities  into  Equations  4.12  and  k.lk, 
respectively,  yields 

(i-1/2)  rptQp  ^  A(d)  ^  (i+l/2)/rp  (4.15) 

(i-3/2)  rpbQt  ^  A(d)  *  (i-l/2)/rpbQt  (4.l6) 

A  value  of  A  (d)  is  needed  which  satisfies  both  Equations  4.15  and 
4.l6.   Such  a  value  must  be  within  the  range 

(i-l/2)/rp     ^  A(a)  *    (i-l/2)/rpbQt  (4.17) 
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Denote  the  lower  bound  of  this  range,  LB(i),  and  the  upper  bound, 

UB(i).   Now  for  all  i,  find  maximum  value  of  LB(i)  and  designate  it  LB  max. 

Find  minimum  UB(i)  and  designate  it  UB  min.   Then  select  A(d)  such  that 

LB    =  A(d)  =  UB  .   and  A(d)  is  the  simplest  binary  number  in  the  range, 
max  mm  *  ° 

Every  value  of  d  is  of  the  form  mAd  where  m  is  an  integer  and  d  is 

a  negative,  integer  power  of  2.   The  index,  m,  is  therefore  a  unique,  minterm 

definition  of  d.   Let  a  .,  a   .  a, a.   be  a  bit  string  representation  of 

-1  o    1      j 

A(d).   Each  bit  corresponds  to  a  Boolean  function  of  d  and  thus  a  Boolean 
function  of  m. 

a-l  =  g-l  m 
aQ  =  go  (m) 

al  =  Sl  ^ 


a  =  g   (m) 
J    J 

Each  function,  g. ,  is  defined  as  the  OR  of  all  d-minterms  for  which 
a.  is  1  in  the  bit  string  version  of  A(d).   In  other  words,  the  set  of  min- 
terms ,  M. ,  corresponding  to  g.  is 

M.   =  {mla.  in  A  (mAd)  is  1}  . 
l       '  l 

Figure  10  is  an  annotated  flowchart  of  the  program  (QSU)  which 

actually  produces  the  definitions  of  the  output  functions  for  Table  1. 
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For  given  values  of  r ,  n , a  ,  3 . 
A ,  y  find  the  maximum  Ad  which 
will  satisfy  the  precision 
requirements  everywhere  on  the 
PD-Plot . 


DELD  =  Ad 


Generate  the  array  NDT  (i) 
where  NDT  (I)  is  the  numerator 
of  the  Ith  value  of  d,  where 
d  =  (I  -  M)  *Ad,  M  is  a  constant 
determined  by  the  minimum  value 
of  d.   Let  MM1  be  the  number  of 
elements  in  NDT. 


This  loop  increments  the 
value  of  d.   MM1  is  the 
number  of  d  values. 


DO  290 
1=1,  MM1 


1 

' 

D   =  NDT    (I)    *   DELD 

\ 

Q  =  N 

d  =  d 


Set  quotient  digit 
value  at  N.   Work 
from  Q  =  N  down  to 
Q  =  1. 


Figure  10.  Flowchart  of  QS^  Algorithm 
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ERPP  =  l./DELRP 
ERPN  =  0 


ERPP  =  l./DELRP 
ERPN  =  ERPP 


NP1 
NM1 


N  +  1 
N  -  1 


DO  95 
K  =  1,  N 


Define  maximum 
truncation  error 
in  rp. 


Q  =  NP1  -  K 


Work  from  Q  =  N 
DOWN  to  Q  -  1. 


J  =  D  *  (Q-NR)  *  DELRP 


Find  minimum  rp  for  which 
transition  interval  could 
intersect  d^ 


Note 


V 

DELRP  =  1/Arp, 


Figure  10  (continued).  Flowchart  of  QS^  Algorithm 
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(200  \ 


RP  =  J/DELRP 
RPU  =  RP  +   ERPP 
DL  =  RPU/(Q-1+NR) 


Yes 


IQ  =  Q 

DIMIW(IQ)  =  (Q-.5)/RP 


1 


J  =  J 


I 


RP  =  J/DELRP 
RPL  =  RP-ERPN 
DR  =  RPL/(Q-NR; 


J  =  J  +  1 


No 


Find  left  end,  DL,  of  divisor 
transition  interval  for  present 
RP. 

ERPP  =  Y 
DL  =  d 

a    =   0 


RPTOP  has  been  found  . 
IQ  is  an  integer  version 
of  Q. 


Move  down  to  next  lower 
rp. 


Find  right  end,  DR,  of  divisor 
transition  interval  for  present 
RP. 


Figure  10  (continued).  Flowchart  of  QSU  Algorithm 
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DIMAX  (IQ)  =  (Q-0.5)/RP 


RPBOT  has  been  found. 


For  J  =  1,  N 

Find 
LBMAX  =  max  (DIMIN(j)) 
UBMIN  =  min  (DIMAX(j)) 


ALL  DT  (LBMAX,  UBMIN,  DIN  (i),  DID  (i)) 


Subroutine  DT  finds  a  value  for 
the  inverse  of  D,  DI,  such  that 
DI  =  DIN  (I)/DID  (I), 
LBMAX  <_  DI  <_  UBMIN,  and  DI  is 
the  simplest  binary  fraction  in 
the  interval. 


TN 
TD 


DIN  (I) 
DID  (I) 


Figure  10  (continued).   Flowchart  of  QSU  Algorithm 
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DO  68 
K  =  1,  12 


This  DO-Loop  assigns  each 
minterm  corresponding  to 
a  d  value  to  the  appropriate 
output  functions. 


IP(K)  =  IP(K)  +  1 
A(K,IP(K)  =  NDT(I) 


TN  =  2  *  TN 


IP(K)  =  IP(K)  +  1 
A(K,IP(K))=  NDT(I) 


If  D  =  WDT  (I)  *  DELD  implies 
hit  K  of  the  output  is  1,  the 
NDT  (I)  is  added  to  the 
minterm  list  for  A(K).   IP(K) 
the  pointer  for  the  Kth  list. 


is 


Figure  10  (continued).  Flowchart  of  QSi+  Algorithm 
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^.3.2  Minimizing  the  Output  Functions 

The  same  techniques  used  to  minimize  the  output  functions  of  Table  2 
are  used  to  minimize  the  output  functions  of  Table  1.   These  were  described 
in  Section  U.2.2. 


5.   RESULTS  FROM  DESIGN  PROGRAMS 

5.1  Preliminary  Remarks 

The  series  of  computer  runs  of  the  design  and  analysis  routines 
described  in  the  last  chapter  gave  rise  to  four  types  of  results.   First,  the 
algorithm  produced  numerical  results  for  the  cost  of  implementing  Table  1  or 
Table  2  for  various  values  of  design  parameters.   But  in  retrospect  it  appears 
that  the  value  of  the  computer  was  more  insight  than  numbers.   Studying  the 
numerical  results  gave  rise  to  some  theoretical  results  with  which  to  attack 
the  problem  of  determining  cost  without  actual  design. 

A  third  result  was  a  discrepancy.   For  some  parameter  values  the 
theoretical  results  and  the  results  obtained  from  the  computer-aided  synthesis 
were  in  disagreement.   Closer  study  revealed  a  weakness  in  the  QS3  algorithm. 
The  fourth  and  final  result  of  the  work  to  date  was  therefore  an  improved 
algorithm  for  designing  Table  2. 

5.2  Numerical  Results  from  Design  Programs 

5.2.1  Cost  of  Table  2  for  Type  2  Structure 

Considering  the  large  number  of  possible  combinations  of  parameter 
values,  even  if  restricted  to  practical  cases,  very  few  designs  were  actually 
generated  in  this  present  work.   After  generating  the  cost  data  for  Table  2 
with  r  =  16,  n  =  10,  a=l/2,b=l,  y=A  =  l/l6,  a  =  0,  and  B  =  1/256, 
sufficient  insight  was  gained  to  propose  an  analytic  expression  for  the  cost 
of  implementing  each  quotient  region  of  the  table.   Two  additional  runs  of 
the  Table  2  routines  with  different  parameter  values  tended  to  substantiate 


60 


6l 

the  predicted  costs,  but  several  points  stood  out  as  discrepancies.   In 
attempting  to  reconcile  the  disagreement,  a  flaw  in  the  QS3  algorithm  was 
discovered:   the  selection  of  divisor  transition  values  as  the  simplest  binary 
number  in  the  transition  interval  does  not  necessarily  produce  a  minimum  cost 
design.   In  view  of  this  flaw,  further  runs  of  the  algorithm  were  not  justi- 
fied.  The  major  emphasis  was  shifted  to  that  of  developing  a  reasonable 
derivation  of  an  analytic  cost  expression  and  to  developing  an  algorithm  which 
would  in  fact  yield  correct  results  which  could  be  used  to  verify  the  ex- 
pression.  The  parameter  values  selected  correspond  to  practical  cases.   Let 
r,  denote  the  radix  and  assume  a  multiplication  structure  in  which  the  follow- 
ing multiples  of  the  multipland  are  available:   +  1  or  +_  2,  +_  k   or  +_  8,  +_  l6 
or  +_  32,  ...  ,  +_  (r-2)  or  +  (r-l).   Each  of  the  groups  such  as  +_  k   or  +_  8, 
correspond  to  a  two-way  shift  gate.   Only  one  of  the  two  multiples  may  be 
selected  simultaneously.   The  magnitude  of  the  maximum  multiple  which  may  be 
formed,  n,  is  therefore  2  +  8  +  32  +  . . .  +  (r  -  l)  =  2  (r  -  l)/3.   Since  the 
same  structure  is  used  for  division,  the  maximum  quotient  digit  is  also  n  and 
therefore  in  the  cases  studied,  n  =  2  (r  -  l)/3  and  thus  the  redundancy  ratio, 
P,  is  2/3. 

As  mentioned  earlier,  the  study  was  restricted  to  the  first  quadrant 
of  the  P-D  plot.   The  divisor  ranges  considered  were  the  binary  normalized 
case  in  which  1/2  <_  d  <  1 ,  and  a  second  case  for  which  3 A  <.  d.  <  9/8.   This 
second  case  corresponds  to  a  case  in  which  a  divisor  in  the  range  1/2  <_  d  <  1 
is  multiplied  by  3/2,  if  d  <  3/k. 

The  maximum  truncation  errors  in  rp,  y   and  A,  are  initially  set  to 
the  maximum  value  for  which  the  criterion  in  Section  2.3  is  satisfied,  l/l6. 
Error  was  assumed  in  both  directions  so  that  the  results  would  be  applicable 
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to  symmetric  adders  or  subtracters  [10]. 

The  divisor  is  strictly  positive  and  non-redundantly  represented 
thus  a  =  0.   The  positive  truncation  error  was  the  maximum  necessary  to 
satisfy  the  selection  criterion  (Section  2.3)  everywhere  on  the  P-D  plot  for 
the  given  value  of  y   and  A . 

Table  2  summarizes  the  cost  computations  for  a  Table  2  structure  with 
r  =  16,  n  =  10,  a  =  1/2,  b  =  1,  y   =  l/l6,  A  =  l/l6,  a  =  0,  and  6  =  1/256. 
Radix  l6  was  selected  as  sufficiently  large  to  be  interesting  but  not  so  large 
as  to  demand  great  expense  of  computer  time.   Table  k   presents  corresponding 
results  for  divisors  in  the  range  3 A  -  d  <  9/8.   No  cost  values  are  given  for 
the  upper  quotient  region,  q  (n).   These  regions  were  not  minimized  since  the 
results  would  be  highly  inaccurate  without  the  ability  to  include  don't  care 
minterms.   The  upper  boundary  of  q  (n)  need  not  be  implemented  since  the  range 
restrictions  imposed  by  the  division  algorithm  would  prohibit  (d,  rp)  values 
to  occur  above  the  q  (n)  region.   All  minterms  corresponding  to  points  above 
the  line  rp  =  (n  +  p)  d  are  therefore  don't  care  minterms  which  sharply 
minimize  the  cost  of  implementing  the  adjacent  q  (n)  region. 

Note  that  the  cost  of  a  Table  2  structure  for  r  =  h,   n  =  2  is  also 
contained  within  Table  2.   Neglecting  the  upper  region  q(2)  the  cost  is  the 
cost  of  q  (0)  +  q  (l)  for  radix  16  less  2  literals  per  required  prime 
implicant. 
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Table  2.   Summary  of  Cost  Calculations  for  Table  2  with 

r  =  16,  n  =  10,  a  =  1/2,  b  =  1,  y   =  1/16, 
A  =  1/16,  a  =  0,  3  =  1/256. 


Min.  No.  ...    „  Mm.  No. 

.  _,.  Mm.  No.      .  _  . 

of  Bits  of  Prime 

No.  of   Required  Impli-    „  _  T . .    .     . 

q-     _.,      .5  terms               No.  of  Literals   Average 

u      .     Bits    in  d  to  ,   _  _.  cants  to   ,   _  _.   _   .     _   . 

Region   .         _  _.  to  Define    _  _.      to  Define  Region   Fan-in 

in  rp    Define  ..         Define               to 

the  _   .       Region               „ . , . N   „,/.\ 
T3   •  Region     ,,,?.\                C'(i)   F'(ij 
Region  fa        M'  (1) 


Est.  Act.  rp     d   Total 


0 

8 

2 

12 

12 

1+ 

25 

6 

31 

7-75 

1 

8 

k 

96 

99 

13 

82 

27 

109 

8.38 

2 

8 

5 

192 

195 

21 

138 

62 

200 

9.52 

3 

8 

6 

381+ 

38U 

36 

236 

129 

365 

10. lU 

k 

8 

6 

381+ 

385 

1+5 

296 

190 

1+86 

10.80 

5 

8 

T 

768 

765 

60 

389 

269 

658 

10.96 

6 

8 

7 

768 

771+ 

72 

1+61+ 

331+ 

798 

11.08 

T 

8 

7 

768 

761+ 

81+ 

51+1 

1+2I+ 

965 

11.1+9 

8 

8 

7 

768 

771 

96 

627 

507 

1131+ 

11.81 

9 

8 

8 

1536  1526 

109 

711 

581+ 

1295 

11.88 

Totals  5l+0  3509     2532     60l+l 
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Table  3.   Summary  of  Cost  Calculations  for  Table  2 

with  r  =  16,  n  =  10,  a  =  3/4,  b  =  9/8,  y  =  l/l6, 
A  =  1/16,  a  =  0,  $  =  1/128. 


q- 

Region 


Min.  No. 

Min.  No. 
of  Min- 

Min.  No. 

No.  of 

of  Bits 
Required 

of  Prime 
Impli- 

Bits 

in  d  to 

to  Define 
the 
Region 

cants  to 

in  rp 

Define 

the 
Region 

Define 
Region 
M'(i) 

Est. 

Act. 

8 

4 

24 

24 

2 

8 

4 

45 

44 

7 

8 

5 

90 

91 

14 

8 

6 

180 

180 

23 

8 

6 

180 

181 

27 

8 

7 

360 

359 

32 

8 

7 

360 

362 

40 

8 

7 

360 

358 

54 

8 

7 

360 

363 

54 

8 

7 

360 

358 

61 

No.  of  Literals    Average 
to  Define  Region   Tan-in 

C'(i)   F'(i) 


rp 


Total 


10 

7 

17 

8.50 

4o 

26 

66 

9.43 

84 

57 

141 

10.07 

139 

101 

240 

10.43 

166 

127 

293 

IO.85 

199 

160 

359 

11.22 

246 

212 

458 

11.45 

332 

301 

633 

11.72 

339 

305 

644 

11.93 

383 

351 

734 

12.03 

Totals 


314 


1938  1647  3585 
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5.2.2  Cost  of  Table  1  for  Type  1  Structure 

The  design  of  Table  1  is  considerably  less  complicated  than  that  of 
Table  2  since  it  is  a  function  of  only  one  input  rather  than  two.   The  costs 
for  radix  k,   16,  and  6k   were  generated  and  summarized  in  Table  k.      The  com- 
plexity of  the  table  is  adequate  to  produce  a  quotient  digit  in  the  leading 


bits  of  the  product  A  •  rp,  where  A  =  f(d)  and  is  of  the  form  a 


^0 


Table  k.      Summary  of  Cost  Calculations  for  Table  1  with 
a  =  1/2,  b  =  1,  y  =  1/16,  A  =  1/16,  a  =  0. 


Note:  NPI  =  Minimum  Number  of  Prime  Implicants 
NL  =  Minimum  Number  of  Literals 


Output 
Bit 


r  =  k,   n  =  2 
6  =  1/16 


NPI 


NL 


r  =  16,  n  =  10 
3  =  1/256 


NPI 


NL 


r  =  6k,   n  =  k2 
6  =  1/1024 


NPI 


NL 


an 


1 
k 
7 
7 


1 

3 

8 

12 

16 

19 
18 


1 
8 
29 
56 
79 
95 
109 


1 

1 

k 

13 

9 

38 

18 

91 

28 

153 

Ul 

239 

63 

Uoi 

80 

55k 

80 

579 

9 

70 

Totals 


19 


77 


377 


333 


2139 
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5-3  Analytic  Results  Concerning  Cost  of  Table  2 

5.3.1  Preliminary  Remarks 

Figure  11  is  a  plot  of  cost  in  literals  of  implementing  q(i)  versus 
i  for  results  given  in  Table  2.   To  a  first  approximation  the  cost  varies 
linearly  with  i.   This  observation  led  to  a  comparison  of  the  empirical  results 
with  the  theoretical,  indirect  measure  of  the  cost  of  selection  of  quotient 
digits  suggested  by  Robertson  [5  ]•   This  cost  function  also  exhibits  a  similar 
behavior  with  i.   In  the  following  we  will  review  aspects  of  Robertson's  work, 
suggest  extensions  and  then  propose  an  expression  for  the  cost  of  implementing 
Table  2  as  a  function  of  design  parameters. 
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Figure  11.   Cost  of  Implementing  q(i)-Region  vs.  i  for  Data  in  Table  2, 
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5.3.2  Definition  of  s.,  s!,  and  s'.' 

1   i i_ 

In  Robertson's  work  the  design  problem  is  presented  as  that  of 

choosing  comparison  constants  against  which  rp.  is  compared  and  of  determining 

J 

the  divisor  range  for  which  each  comparison  constant  is  valid.   The  proposed 
measure  of  cost  of  selecting  between  q(i)  and  q(i-l)  is  the  minimum  number  of 
comparison  constants  required  to  cover  the  given  range  of  the  divisor. 

The  selection  ratio,  a. ,  is  first  defined.   It  is  the  ratio  of  the 

i  ' 

slope  of  the  line  defining  the  lower  boundary  of  q(i)  to  the  slope  of  the  line 
defining  the  upper  boundary  of  q(i-l),  i.e., 

a.     i  -  p 


i  -  1  +  p 

The  selection  ratio  is  a  relative  measure  of  the  width  of  the  divisor  interval 
for  which  a  single  comparison  constant  is  valid.   The  minimum  number  of  divi- 
sor intervals  required  to  correctly  distinguish  between  q  =  i  and  q  =  i-1 
corresponds  to  the  number  of  treads  in  the  staircase  between  the  upper  boundary 

of  q(i-l)  and  lower  boundary  of  q(i). 
I 

Let  s.  denote  the  minimum  number  of  steps  required  to  span  the  over- 
lap region  between  q(i)  and  q(i-l)  for  the  divisor  range  a  to  be  as  shown  in 
Figure  12.   The  slope  of  the  upper  boundary  is  v  =  i-l+p  and  the  slope  of  the 
lower  boundary  is  w  =  i-p .   Let  A  be  the  width  of  the  rightmost  tread,  A  be 
the  width  of  the  second  tread  (moving  from  right  to  left),  etc.   The  quantity, 
h,  is  the  height  of  the  riser  between  tread  1  and  tread  2. 


By  definition 

w  =  h/A1  ,  (5-2) 


v  =  h/A2  ,  (5-1) 


and  thus 


A0  =  a.    A_.  (5-3) 

2    l   1 
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v  =  i  -  1  +  p 


w  =  i  -  p 


In  general, 


By  definition 


Figure  12.   Graphical  Interpretation  of  s.. 


A,   =   a.  A, 
k      i  k 


(k-1) 
ai      Al 


(5.k) 


s. 


^a.(k-1)A1  -  b-a. 
k=l 


(5.5) 


The  left  side  of  Equation  5-5  is  the  sum  of  a  geometric  series  and  thus 


s. 

o.  x  -  1 

Al   a.  -1   =  b  "  a  * 

i 


(5-6) 
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Since  A  =  b(l-a.),  s.  is  the  smallest  integer  that  satisfies 

s. 

c.    -  a/b  . 


(5.7) 


For  present  purposes,  consider  s.  to  be  a  continuous  variable,  rather 
than  an  integer.   Then, 

s.   =  log(a/b)  /  log  0.  .  (5.8) 

We  will  now  change  the  expression  for  s.  into  a  form  which  makes  apparent  the 
linear  behavior  with  i.   By  the  properties  of  logarithms 

log  (a  )  =  log  (i-p)  -  log  (i-l+p) 

=  log  (l+x)  -  log  (1-x) 
where  x  =   (l-2p)  /  (2i-l). 

With  p  restricted  to  the  range  1/2  -  p  <  1,  then  -1  <  x  <  1  and  thus   a 
series  form  of  log (l+x)  -  log ( 1-x)  may  be  used.   Therefore, 

2m-l 


(5.9) 


log  a.   =2 


3    5 


x 


2m-l 


(5-10) 


and  thus , 


=  2x  +  h.o.t., 


log(b/a)   (i-1/2)  /  (2p-l) 


(5.H) 


The  quantity,  s . ,  as  defined  so  far  is  based  upon  the  assumption  of 
full  precision  in  the  representation  of  the  divisor  and  partial  remainder.   The 
expression  for  s.  will  now  be  modified  to  yield  the  minimum  number  of  steps 
required  to  transerve  the  transition  region  between  q(i)  and  q(i-l)  when  only 
estimates  of  rp  and  d  are  available,  rp  and  d  respectively.   Assume  as  before 
that  rp  is  representative  of  rp-values  in  a  range  given  by  rp  -  A  -  rp  -  rp+y  , 
and  that  d  is  representative  of  d-values  in  the  range  d-a-d-  d+B.  For 
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the  time  being,  assume  that  rp  and  d  may  assume  any  value,  not  merely  discrete 
values. 

If  we  consider  the  staircase  to  be  the  upper  boundary  of  the  (d,  rp) 
values  defining  the  q(i-l)  region,  then  for  all  pairs,  (d  ,  rp.  ),  defining 
the  risers  and  treads,  the  restriction 

rp±_1  +  Y  -  v(^i_!  "  a)  (5-12) 

must  hold.   Thinking  of  the  staircase  as  the  lower  boundary  of  values  defining 
the  q(i)  region,  then  for  all  pairs  (d. ,  rp. )  defining  the  risers  and  treads, 
the  restriction 

rp\  -  X     ±     w(d±  +  3)  (5-13) 

must  hold. 

Since  adjacent  values  of  rp  are  separated  by  Arp  and  adjacent  values 
of  d  are  separated  by  Ad, 

d.   =  d.  _  -  Ad,  and  (5.1*0 

l      l-l 

rp.   =  rp.   +  Arp.  (5.15) 

The  staircase  must  satisfy  both  restrictions  5.12  and  5.13  subject  to 

equations  5.lU  and  5-15.   Substituting  Equations  5.1^  and  5*15  into  5«13  yields 

another  restriction  in  terms  of  rp.  _  and  d.  n  ,  namely 

l-l      l-l 

rp.  ,  +  Arp  -  A  ^  w(d.  .  -  Ad  +  6).  (5-l6) 

l-l  l-l 

For  a  given  value  of  rp.    the  maximum  tread  width  is  therefore  the  distance 

between  the  intersection  of  the  line  rp  =  rp.    and  the  lines 

rp  =  w(d  -  Ad  +  8)  +  A  -  Arp,  and  (5-17) 

rp  =  v(d  -  a)  -  y.  (5«l8) 
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v  =  i  -  1  +  p 


w  =  1  -  p 


(T)  rp  =  vd 

(2)  rp  =  v(d  -  a)  -  y 


@  rp  =  w(d  -  Ad  +  3)  +  X  -   Arp 
(h)     rp  =  wd 


Figure  13.   Graphical  Interpretation  of  s!, 


Figure  13  is  a  graphical  interpretation  of  the  minimum  step  boundary 
between  q(i)  and  q(i-l)  for  this  non- precise  case. 

The  effect  of  the  imprecision  on  s.  may  be  thought  of  as  shifting 
the  divisor  range  of  the  P-D  plot  by  an  amount,  d1  given  by 


,,     A  +  y  -  Arp  +  vet  +  w(B  -  Ad) 
2p  -  1 

The  value  of  s.  in  this  case,  denoted  s!,  is  given  by 


(5.19) 


s: 

1 


log(  (a  -  d')  /  (b  -  d')  ) 


log  c\ 


(5.20) 
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Note  that  this  equation  is  equivalent  to  replacing  a  by  a-d1  and  b  by  b-d1  in 
Equation  5.8.   This  may  be  verified  by  replacing  A   in  Equation  5.6  by  the 
appropriate  expression  in  the  present  case ,  namely  by 

A   =  b  -  v(b  +  e  "  M)  "  (y  +  A  "  Arp)  *  a  (S  21) 

1  v 

Geometrically,  d'  is  the  value  of  d  at  the  intersection  of  the  lines 
defined  by  Equations  5-17  and  5-l8. 

Equation  5-19  implies  that  it  is  not  merely  the  imprecision  but 
rather  the  redundancy  in  the  representation  of  rp  and  d  which  increases  the 
number  of  treads  in  the  boundary  staircase.   First,  note  that  to  insure  cover- 
ing, i.e.  that  every  value  of  rp  and  d  map  into  at  least  one  rp  and  d,  respec- 
tively, the  inequalities 

X   +   y  -  Arp  ^  0,  and  (5-22) 

a  +  6  -  Ad  5b  0  (5-23) 

must  hold.   This  restriction  forbids  d'  from  being  negative  and  thus  s!  being 
less  than  s..   If  A  +  y  -  Arp  =  0,  a  =  0,  and  6  -  Ad  =  0,  then  s!  =  s..   This 
corresponds  to  the  case  in  which  every  rp  and  d  value  map  into  one  and  only  one 
rp  and  d,  respectively. 

In  terms  of  the  P-D  plot  this  means  that  there  is  no  overlap  between 
the  area  represented  by  the  pairs  (d,  rp).  In  this  case,  even  though  rp  ^  rp, 
and  d  /  d,  the  selection  is  theoretically  no  more  complicated  than  in  the  full 
precision  case. 

For  the  cases  treated  in  this  study  X   =  A'Arp,  y  =  Y'Arp,  a  =  0, 
3  =  Ad,  and  thus 

d,  .  top  U'  +T'-i)  m  (5.2lt) 

2p  -  1 
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The  analysis  so  far  has  allowed  for  an  error  in  representing  d  and  rp 

but  has  not  restricted  the  value  of  d  and  rp.   In  practice  these  are  formed  by 

—  ft 
truncation  and  therefore  are  restricted  to  integral  multiples  of  Ad  =  2   and 

—  £ 

Arp  =  2  '  where  6  and  e  are  the  number  of  bits  to  the  right  of  the  binary  point 
in  the  representation  of  d  and  rp  respectively.   The  location  of  the  treads  and 
risers  of  the  actual  staircase  which  can  be  implemented  may  therefore  simulta- 
neously differ  by  as  much  as  Ad  and  Arp,  respectively.   The  maximum  number  of 
steps  (taking  into  account  both  error  and  discrete  effects)  required  to  define 
the  boundary  between  q(i)  and  q(i-l)  may  therefore  be  given  by 

log(  (a  -  d")  /  (b  -  d")  )  ,q  j,_, 

i  "  logo.  ^'d>) 


where 


d„   =   X  +  y  -  Arp  +  2   +  v(a  +  2   )  +  w(g  -  Ad)        (5.26) 

2p  -  1 

The  actual  number  of  steps  required,  s.     is  therefore  bounded  by 

i  ac*c 


'i  act 


(5.27) 


Equation  5 .26  may  be  used  to  determine  the  minimum  values  of  e   and  6 
required  for  a  given  P-D  plot.   The  quantity,  sV ,  and  thus  the  cost,  will  tend 
to  infinity  as  d"  approaches  a.   To  insure  that  every  region  of  the  P-D  plot 
may  be  correctly  defined  for  given  values  of  A ,  y»  a»  3>   the  quantities  e  and 
6  therefore  must  be  selected  such  that  d"  <  a. 

5*3.3  An  Estimate  of  Cost  as  a  Function  of  s! 

i 

In  this  section  we  will  hypothesize  an  expression  for  the  cost  of 
implementing  the  q(i)  region  of  a  given  P-D  plot.   Consider  the  region  to  be 
defined  by  a  set  of  minterms  corresponding  to  the  set  of  ordered  pairs  (d,  rp) 


Ik 

^  ( '  \ 

for  which  q  =  i.   Let  Ad  for  the  region  be  2~     and  Arp  for  the  region  be 

-e(i) 

2     .   The  number  of  minterms  to  define  the  region  will  be 

M(i)   =   (b2  -  a2)  2e(i)  +  6(i)  "  1    .  (5.28) 

The  fan-in  to  each  minterm,  F(i)  is  given  by 

F(i)   =  e'  +  6«  (5.29) 

where                   e'   =  log  r  +  e(i),  and  (5-30) 

6'   =  l(log2  (b  -  2"6(i))  +  1)   +  6(i).  (5.31) 

The  term  I  (log  (b-2   )  +  l)  is  merely  the  number  of  bits  of  the  divisor  to 
the  left  of  the  radix  point.   Recall  that  l(x)  has  been  defined  as  the  integer 
portion  of  x. 

The  cost  before  minimization  is  given  by 

CT(i)   =   M(i)  F(i)  +  M(i)  (5-32) 

The  term  MF  is  the  number  of  literals  in  the  AND  gates,  the  term  M  is  the 
number  of  literals  in  the  OR  gate. 

After  minimization 

Cvj(i)    =   M«(i)   (F'(i)  +  1)  (5-33) 

where  M'(i)  is  the  number  of  prime  implicants  and  F'(i)  is  the  average  fanin 
to  each  prime  implicant. 

In  order  to  obtain  approximations  of  M'(i)  and  F'(i),  we  now  approxi- 
mate the  effects  of  minimization  by  the  following  algorithm. 

Figure  ±h   illustrates  a  portion  of  a  quotient  region.   Note  that  it 
may  be  defined  by  a  set  of  adjacent  rectangles  (denoted  by  heavy  lines)  each 
of  which  is  defined  by  a  set  of  minterms  (denoted  by  small  squares).   Consider 
one  of  the  rectangles  of  width  ¥  and  height  H.   Assume  that  minimization 
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Figure  ik.     Model  of  the  q(i)  Region  Used  in 
Approximating  Effects  of  Minimization 


procedes  first  in  the  d-direction  by  combining  adjacent  minterms  which  differ 
by  only  the  low  order  bit.   If  there  were  initially  M  minterms  in  the  rectan- 
gle, after  the  first  step  there  are  M/2  implicants.   Next,  the  implicants  which 
differ  only  in  the  next  to  low-order  position  may  combine  to  produce  M/k   impli- 
cants, etc.   The  minimization  in  the  d-direction  continues  for  k  =  I  (log  W) 

kd  d        2 

steps  to  form  M/2   implicants.   Similarly,  combinations  take  place  in  the  rp- 

k  +  k 
direction,  further  reducing  the  number  implicants  to  M/2  d    rp  where 

krp  =  J    (l°g2  H)* 
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The  minimization  of  the  quotient  region  will  be  characterized  by  an 
average  rectangle  of  dimensions  WH.     The  width  is  defined  by 

W  =  2  (b  -  a)  /  7[  (5.3M 

where , 

i[  s  (sj  +  aj[+1)  /  2.  (5.35) 

The  quantity,  W,  is  therefore  the  average  width  of  the  minimum- number  treads 
defining  the  upper  and  lower  boundary  of  q(i).   The  height  is  defined  by 

H  =  2£(b+a)  /  k  (5.36) 

which  is  the  average  value  of  the  distance  between  rp  =  (i  +  1/2)  d   (nominal 
upper  boundary)  and  rp  =  (i  -  1/2)  d   (nominal  lower  boundary). 

The  preceeding  argument  suggest  a  cost  expression  of  the  following 

form:  * 

C'(i)   =  0±   ^  (F(i)  -  k  +  02)  (5.37) 

where 

M  and  F  are  defined  by  Equations  5.28  and  5.29,  respectively, 

and  k  is  defined  by 

k  =  k  +  k 

d    rp 

=  log2  WH  (5-38) 

The  factors  0  and  0  are  constants  which  will  be  determined 

empirically.   Equation  5*37  may  be  rewritten  as 

C'(i)  =  M'(i)  F'(i)  (5.39) 

where  

M'(i)  =  2  ^  s!  ,  and  (5.U0) 


*Note  that  C'(i)  is  the  number  of  literals  in  the  AND  gates; 
C'(i)  =  C'(i)  +  M'(i)  is  total  number  of  literals  for  the  region. 
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k   0„  r  s1 


.-6 


F'(i)=log2  %     2  +     x  (loS2  (^  "  2  d)  +  1)  .        (5-Ul) 

b  -a 

M'(i)  is  the  minimal  number  of  prime  implicants  required  to  implement  the 
Boolean  function  for  q(i)  and  F'(i)  is  the  average  fanin  to  each  prime  impli- 
cant . 

We  now  use  numerical  results  from  Tables  2  and  3  to  find  values  for 
0  and  0  and  to  test  the  predictive  worth  of  Equation  5^39-   The  value  of  0 
is  obtained  by  a  least  squares  fit  of  the  actual  values  of  M'(i)  to  Equation 
5.^0.   The  value  of  0  is  obtained  by  a  least  squares  fit  of  the  actual  values 
of  F'(i)  to  Equation  5.Ul.   Values  of  0  =  2.12  and  0  =  1.68  were 
obtained. 

Table  5  summarizes  the  results  of  the  fit.   Figures  15 9  16,  and  17 
display  the  results  graphically  with  s\    as  the  independent  variable.   The  heavy 
line  denotes  the  predicted  values;  the  circles  denote  actual  values. 

Note  that  Equations  5-U0  and  5.^1  do  not  explicitly  account  for  the 

discrete  effects  resulting  from  the  fact  that  the  treads  and  risers  of  the 

-e      -6 
q-region  boundaries  are  restricted  to  integer  multiples  of  2   and  2   , 

respectively.   The  effect  is  included  empirically  in  the  choice  of  0  and  0  . 

There  are  indications  that  a  more  explicit  cost  function  of  both  s!  and  sV , 

^  11 

which  does  include  discrete  effects,  might  be  found.   For  present  purposes, 
however,  the  estimates  given  by  Equations  5*^0  and  5^1  were  judged  to  be 
adequate. 
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Table  5.   Results  of  Least  Squares  Fit  of  M'(i),  F*(i),  and  C'(i) 


a  =  1/2,  b  =  1 


for  Data  from  Table  2. 


i 

i 

M*(i) 

F'(i) 

C'(i) 

Equation 

QS3 

Equation 
7.6 

QS3 
7-7 

Equation 
1+1+ 

QS3 

0 

1.38 

5 

h 

31 

1 

2.83 

12 

13 

8.6 

8.1+ 

103 

109 

2 

5.72 

2k 

21 

9.6 

9.5 

231+ 

200 

3 

8.59 

36 

36 

10.2 

10.1 

373 

365 

1+ 

11.1+6 

1+8 

h5 

10.6 

10.8 

519 

1+86 

5 

1U.33 

6o 

60 

11.0 

11.0 

668 

658 

6 

17.20 

72 

72 

11.2 

11.0 

821 

798 

7 

20.06 

85 

84 

11.  k 

11.5 

977 

965 

8 

22.93 

97 

96 

11.6 

11.8 

1135 

1131+ 

9 

25.80 

109 

109 

11.8 

11.8 

1296 

1295 

a  =  3/1+,  b  =  9/8 


0 

0.7!+ 

3 

2 

7.8 

8.5 

21+ 

17 

1 

1.51 

6 

7 

8.9 

9.k 

56 

66 

2 

3.06 

13 

11+ 

9.9 

10.0 

127 

ll+l 

3 

^.60 

19 

23 

10.5 

10.1+ 

203 

2i+0 

1+ 

6.13 

26 

27 

10.9 

10.9 

282 

293 

5 

7.66 

32 

32 

11.2 

11.2 

363 

359 

6 

9.19 

39 

1+0 

11.5 

11.5 

1+1+6 

1+58 

7 

10.72 

U5 

51+ 

11.7 

11-7 

531 

633 

8 

12.26 

52 

51+ 

11.9 

11.9 

617 

61+1+ 

9 

13.79 

58 

61 

12.0 

12.0 

701+ 

731+ 
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Figure  15-      M'(i)   versus   s'. 
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Figure  l6a.         P'(i)   versus   s. 
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S- 


Figure  l6b.         F'(i)  versus   s\ 
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Figure  17  a.         C'(i)   versus   s. 


.1/  • 
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Si 


Figure  17b.         C'(i)   yersus   sT 
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5.3.1+  Discrepancies 

The  two  cases  for  which  numerical  results  were  presented  in  Section 
5.2.1  differ  only  in  the  range  of  the  divisor.   We  should  also  consider  the 
effect  of  varying  the  precision  in  the  estimates  of  the  operands.   The  program. 
QS3,  was  therefore  also  run  for  the  same  parameter  values  as  listed  in  Table  2 
(Section  2.5«l)  except  that  Arp,  y,  and  X   were  decreased  from  l/l6  to  1/32. 
The  minimized  results  are  shown  in  Table  6.   Numbers  under  the  heading 
'Equation'  are  from  the  evaluation  of  Equation  5.39;  numbers  under  the  head- 
ing 'QS3'  are  from  the  QS3  and  minimization  programs. 

Table  6.   Comparison  of  Results  from  Estimating  Equation 
and  the  QS3  Program  for  Arp  =  1/32. 


i 

s! 

1 

M'(i) 

F*(j 

.) 

C'(i) 

Equation 
5 

QS3 
3 

Equation 
7-37 

QS3 
7.66 

Equation 
36 

QS3 

0 

1.16 

23 

1 

2.38 

10 

10 

8.1+1 

8.20 

81+ 

82 

2 

U.80 

20 

20 

9.^3 

9.65 

191 

193 

3 

7.21 

31 

3)4 

10.01 

10.02 

306 

31+6 

1+ 

9.62 

1+1 

1+1+ 

10.1+3 

10.8 

1+25 

1+76 

5 

12.03 

51 

62 

10.75 

10.9 

5I+8 

679 

6 

Ik.kk 

61 

67 

11.02 

11.1+ 

67!+ 

71+9 

7 

16.85 

71 

8U 

11.21+ 

11.5 

802 

970 

8 

19.25 

82 

90 

11. 1+3 

11.9 

933 

1067 

9 

21.66 

92 

110 

11.60 

11.8 

IO65 

1303 

In  Figure  18 ,  the  data  from  the  C*(i)-QS3  column  of  Table  6  have 
been  added  (denoted  by  X's)  to  Figure  17(a).   Note  that  these  X-points  start 
near  the  predicted  values  (solid  line)  but  increasingly  fall  above  the 
expected  values. 
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Si 


Figure  18.   C'(i)  versus  s|    for  Arp  =  l/l6  and  Arp  =*  1/32, 
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The  source  of  this  discrepancy  turns  out  not  to  be  the  predictive 
equations,  as  might  be  first  suspected,  but  rather  the  QS3  algorithm;  speci- 
fically the  decision  to  pick  divisor  transition  values  as  the  simplest  binary 
fraction  in  the  allowable  interval.   This  choice  was  made  in  the  early  stages 
of  the  research  when  other  measures  of  cost  were  being  used  and  in  changing 
to  the  minterm  approach  it  was  not  evaluated  critically.   Fortunately,  as 
will  be  explained,  it  was  possible  to  salvage  the  numerical  results  produced 
by  QS3-   A  correct  algorithm  has  also  been  found  and  is  described  in  the 
Appendix. 

The  essence  of  the  problem  is  the  failure  to  fully  appreciate  the 
two-dimensional  nature  of  the  minimization  problem.   For  several  of  the  q- 
regions  which  produced  doubtful  results  ,  the  areas  corresponding  to  the  prime 
implicants  of  the  reduced  function  were  drawn  on  a  P-D  plot.   The  upper  and 
lower  stairstep  boundaries  were  therefore  made  apparent. 

By  close  inspection  of  the  boundaries,  it  could  be  seen  that  the 
decision  to  force  the  location  of  risers  to  the  simplest  binary  fraction  some- 
times over-constrainted  the  location  of  the  tread.   In  other  words,  in  some 
cases  for  which  a  divisor  interval  would  have  been  spanned  with  one  tread,  the 
algorithm  generated  two  treads.   Furthermore,  each  of  these  extra  treads 
required  an  extra  prime  implicant  to  define  it.   Thus,  although  the  output 
function  was  minimal  for  the  given  definition  of  the  q-region,  the  given 
definition  of  the  q-region  was  unduly  complicated  and  therefore  not  truly 
minimal.   By  manually  revising  the  boundary  to  eliminate  the  superfluous  prime 
implicants,  it  was  found  that  the  cost  was  reduced  to  close  agreement  with 
the  predicted  values . 
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But  the  constants  in  the  equation  for  estimating  cost,  0  and  0  , 
were  specified  based  upon  results  from  the  QS3  program.  Why  should  they  be 
trusted?  The  answer  to  this  question  is  found  in  the  following  argument. 

If  we  think  of  the  transition  region  between  q(i)  and  q(i-l)  as 
being  defined  by  a  grid  of  vertical  spacing,  Arp,  and  horizontal  spacing,  Ad, 
then  the  set  of  all  boundaries  between  q(i)  and  q(i-l)  is  all  stairsteps 
which  can  be  drawn  along  these  grids  and  still  remain  inside  the  transition 
region.   As  Ad  and  Arp  are  decreased  the  number  of  different  boundaries 
increases  exponentially.   The  problem  is  to  pick  boundaries  that  will  mini- 
mize the  number  of  literals  in  the  Boolean  function  defining  the  area  enclosed 
by  the  boundaries.   (Such  an  algorithm  is  described  in  the  Appendix. )   For- 
tunately for  the  parameter  values  used  to  derive  the  constants  0  and  0  , 
there  was  very  little  choice  in  selecting  the  boundaries  due  to  the  dimen- 
sions of  the  transition  regions.  It  is,  therefore,  asserted  that  the  boundary 
produced  by  the  QS3  algorithm  and  a  correct  algorithm  would  be  very  nearly  the 
same.   A  graphical  spot  check  of  several  of  the  boundaries  confirmed  this 
assertion.   When  however,  Arp  was  reduced  from  l/l6  to  1/32  the  number  of 
possible  boundaries  increased  and  thus  the  discrepancy  became  apparent. 

There  is  one  other  case  for  which  a  discrepancy  is  apparent.   In 
Table  5  for  a  =  3/U,  b  =  9/8,  and  i  =  7,  notice  that  M'(i)  from  QS3  is  5h 
while  the  predicted  value  is  h5 .      This  difference  accounts  for  the  high  points 
at  s!  =  10.72  in  Figures  15  and  17(b).   The  prime  implicant  covering  for  this 
case  (q(7)  )  was  drawn  and  it  was  thus  discovered  that  six  extra  prime  impli- 
cant s  had  been  generated.   In  this  case,  although  Arp  is  also  1/16,  the 
shifting  of  the  divisor  range  to  the  right  increases  the  width  of  the  transi- 
tion region  to  the  extent  that  the  QS3  algorithm  may  fail  badly  for  d  values 
near  the  upper  limit,  b.   Fortunately,  it  did  not  except  in  the  one  region. 
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5.k     Analytic  Results  Concerning  Cost  of  Table  1 

5.1+.1  Preliminary  Remarks 

The  program,  QSU,  produces  a  cost  estimate  of  Table  1  for  a  Type  1 
structure  for  which  the  precision  of  Ad  is  such  that  the  rounded,  integer 
portion  of  Ad  is  a  correct  quotient  digit.  As  mentioned  in  Section  2.U,  we 
are  also  interested  in  hybrid  structures  in  which  Table  1  and  the  multiples 
are  used  to  transform  the  divisor  and  remainders  before  they  are  applied  to 
Table  2.   In  the  following  sections  we  consider  the  effect  of  the  transfor- 
mation on  the  design  parameters  for  Table  2  and  then  propose  an  expression  to 
estimate  the  cost  of  implementing  Table  1  for  given  precision  in  A  and  d. 

5.^4.2  Worst  Case  Bounds  on  Transformed  Parameters 

As  in  Section  2.2,  assume  that  we  are  given  d  which  is  representa- 
tive of  divisor  values  in  the  range  d  -  a  -  d   -   d+B  and  are  given  rp 
which  is  representative  of  remainders  in  the  range  rp  -  A   -  rp  -  rp  +  y. 

Let  A  =  F(d)  be  generated  by  Table  1.   The  range  of  the  transformed  divisor, 

T 
d  ,  now  represented  is  given  by 

Ad  -  Act  ±     dT  £  Ad  +  A3  (5-^2) 

T 
and  the  range  of  the  transformed  remainder  rp   is  given  by 

Arp  -  AA   ^   rp   ^  Arp  +  Ay  (5.^3) 

T   T 
The  divisor  range  which  must  be  accommodated  by  Table  2  is  (a  ,  b  ), 

where 

aT  =   (Ad)  .   -  A    a,  and  (5-hk) 

mm    max 


bT  =   (Ad)    +  A    3.  (5.U5) 


max    max 
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The  worst-case  transformed  values  of  a,  3,  A,  and  y  are  merely  A   a,  A   3, 

'         J      max    max 

A   A,  and  A   y.   If  2   is  the  weight  of  the  low  order  hit  in  A,  then 
max       max 

AdT  =  Ad  2" j,  and  (5.U6) 

ArpT  =  Arp  2~J  .  (5.^7) 

Assuming  that  A    =2,  then  d'  (Equation  5«19)  becomes 


max 


2A  +  2y  -  2"(£+j)  +  2v«  +  w(23  -  2~(6+j)  ) 
2p  -  1 


(5.W 


Assume  that  a  =  0  and  that  j  is  sufficiently  large  to  permit  the  terms  2 

and  2       to  be  neglected  relative  to  A,  y»  a,  and  3j  then 

d'    «   2  Arp  (V  +Y.)  +2wB  .  (   u) 

2p  -  1 

This  value  of  d'  for  given  A',  y' ,  Arp,  and  6  is  greater  than  d'  as  defined 
in  Equation  5.2U.   Furthermore,  d'  increases  with  i  due  to  the  2w3  term. 
This  comparison  indicates  that  although  the  transformation  reduces  cost  by 
narrowing  the  divisor  range  for  Table  2,  it  increases  cost  by  increasing 
restrictions  on  the  q-region  boundaries. 

The  most  difficult  terms  to  evaluate  in  this  analysis  are  (Ad)  . 

mm 

>> 

in  Equation  ^>.hk   and  (Ad)    in  Equation  5.U5.   This  is  the  subject  of  the 

IIlcLX 

remainder  of  this  section. 

The  design  problem  for  Table  1  may  be  viewed  as  that  of  imple- 
menting an  estimate  of  the  function  f(d)  =  d   .   In  the  following  analysis 
we  shall  treat  divisors  in  the  range  1/2  £  d  <•  1.   The  approach  adopted  here 
is  to  specify  the  precision  in  A,  the  estimates  of  d   ,  and  then  to  determine 
the  precision  in  d  required  to  guarantee  that  dA  is  within  a  certain  interval 
in  the  vicinity  of  one.   The  precision  of  A  is  selected  as  the  independent 
variable  since  it  determines  the  number  of  additions  required  in  forming  the 
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product  dA.  The  number  of  additions  is  the  dominant  factor  in  determining 
the  operating  time  of  the  Tl,  Ml,  M2  part  of  the  quotient  selector. 


Let  the  set  of  discrete  values  of  the  output  of  Table  1  be  defined 


by 


A  =  {  mi  } 


(5-50) 


>-J 


where  t  =  2  for  some  positive  integer,  j,  and  m  is  an  integer  ranging  from 
1/t  through  2/t.  The  tick  marks  on  the  ordinate  of  Figure  19  designate  such 
a  set  for  t  =  2 


2  r 


Figure  19.   Geometry  for  Derivation  of  Estimates  of  d 
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For  every  element  of  A  we  must  define  a  divisor  interval  for  which 
mx  is  used  as  the  estimate  of  the  reciprocal  of  divisor  values  in  the  inter- 
val.  Interpreted  graphically,  the  elements  of  A  determine  the  location  of 
the  treads  of  a  stairstep  approximation  to  d   .   The  remaining  task  is  to 
specify  the  location  of  the  risers  (the  dotted  lines  in  Figure  19). 

Let  d.,   and  d    denote  the  left  and  right  ends  respectively  of  the 
l,m     r,m 

divisor  interval  for  which  A  =  mi  is  taken  as  the  inverse  of  divisor  values 

in  the  range  d^    -  d  *  d    .It  may  be  shown  that  the  optimum  values 

l,m  r,m        J  * 

for  d.,    and  d    in  the  sense  of  minimizing  the  maximum  value  of  1-dA  are 

l,m      r,m  '     ' 

\*     "   x  (I   +  1)    •  and  (5-5l) 

(5-52) 


r  ,m     t  (2m  -  l) 

These  equations  correspond  to  the  reciprocal  of  the  average  value 

of  xm  and  x(m+l),  and  xm  and  x(m-l).   For  divisor  values,  d,  in  the  range 

dn     -  d  «*   d    ,  the  range  of  dA  is  given  by 
l,m  r,m 

1  -  e~(m)  *     dA  ^  1  +  e+(m)  (5-53) 

where 

e+(m)  =  1  /  (2m  -  1)  (5-5*0 

e"(m)  =  1  /  (2m  +  1)  .  (5-55) 

The  negative  error  is  maximum  for  m  =  m  .    =  l/x ,  but  since  1/2  -  d  *■  1, 

mm 

the  positive  error,  e  (m)  is  maximum  at  m  =  m  .    +  1. 

mm 

In  practice  dn    and  d    are  also  discrete  values  and  thus  in 
r  1  ,m      r  ,m 

general,  cannot  be  placed  precisely  as  specified  by  Equations  5«51  and  5*52. 

A 

In  this  case  the  determination  of  the  error  bounds  on  the  product  dA  is  more 
complicated. 
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If  cL    and  d    are  represented  to  6  places  to  the  right  of  the 
1  ,m      r ,m 

radix  point  then  the  actual  end  points  can  be  within  2       of  the  theore- 
tically optimal  point.   Let  A  =  2   '    for  the  worst  case,  replace  d 

by  dn    -  A  and  replace  d    by  d    +  A. 
l,m  r,m     r,m 


Now, 


where 


1  -  e(m)  *  dA  ^  1  +  e+(m)  (5-56) 


e+(m)   =  mAx  +  1  /  (2m  -  1)  (5-57 


e  (m)   =  mAx  +  1  /  (2m  +  l)  (5-56\ 


Note  that  due  to  the  range  restriction  of  d, 

e+(m  .  )   =  m  .  At  (5-59) 

min       mm 

and  e  (m   )   =  m   At    .  (5-60) 

max       max 


Since  we  require 


2"6  -  ^  (  i  '  -j—   )  (5.61) 

t   m     m  +  1 


x 

for  all  allowable  m,  the  maximum  value  of  2   should  be  less  than  or  equal  to 
xA»  and  6  should  be  less  than  or  equal  j  +  2. 

For  given  values  of  t  and  A 

(Ad)  .    =  1  -  e"(m)  (5-62) 

mm  max 

(Ad)     =  1  +  e+(m)  (5.63) 

max  max 

taken  over  all  m  in  the  range  l/x  to  2/t. 
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5.U.3  An  Estimate  of  the  Cost  of  Table  1 

We  now  derive  an  expression  with  which  to  estimate  the  minimum  cost 
in  literals  of  Table  1  when  structured  as  specified  in  Section  3.2.3.   Let 
the  outputs  of  value  A  be  of  the  form 

A  =  a_2  a-l  *  ai  a2  *•*  aj 

_  r 

and  considered  the  d  axis  of  Figure  19  to  be  equally  divided  in  units  of  2 

After  all  values  of  dn    and  d    are  specified,  each  bit  of  A  may  be  defined 

l,m      r,m      *  J 

r 

by  a  sum-of-products  of  minterms  of  the  form  k  2 

Let  A  =  a  a   .a  a  ...a.  .  We  will  now  derive  an  estimate  of  the 
cost  of  implementing  a.   =  f.(d).   In  the  range  1  -  A  -  2,  each  bit,  a., 

is  1  in  2    intervals,  each  of  length  2   .   Let  y!  n  be  the  value  of  the 

J  i,k 

bottom  of  the  k   interval  along  the  d     axis  for  bit  a.  and  let  yV   be  the 

1  X  ,K 

top  of  the  interval. 
Thus, 

y!  ,   =  1  +  (2k  -  1)  2"1  (5.6U) 

l  ,k: 

yV  ,   =  1  +  2k2_1  (5.65) 

1  ,K. 

for  i  =  1,  2,  ...,  j  and  k  =  1,  2,  ...,  2(l"   . 

Let  X    be  the  width  of  the  corresponding  interval  along  the  d-axis, 
1  ,K 

thus 

X.  .    = r^ : (5.66) 

1,J"       (Uk^  -  2k)  2~^   +  (ilk  -  1)  2"1  +  1 

Let  each  interval  of  width  2  '  along  the  d-axis  correspond  to  a 

minterm,  each  with  a  fan-in  of  6.  The  number  of  minterms  required  to  define 

X.  ,  is 
i,k 


9k 


V   "   ^i,*'  (5-6T) 


the  number  of  literals  is 


C.>k  =  «M.;k.  (5.68) 

Using  the  same  approximation  to  the  minimization  algorithm  as 
described  in  Section  5.3.3,  the  cost  in  literals,  after  minimization  for 
implementing  the  X.    interval  is 

1  ,k 

c±,k    ■    Mi,kFi,k  (5-69) 

where,  with  u  =  I  (log  M.   ) 

c.       1  jK. 

M!  .    =   M.  .  /  2y 
i,k        i,k 

is  an  approximation  to  the  number  of  prime  implicants  required  and 

F!  ,    =  6  -  m  (5-70) 

i,k 

is  an  approximation  to  the  average  fan-in. 

The  cost  of  implementing  a.  =  f.(d)  is  therefore 


(5.71) 


i 

= 

I 

k  = 

■1) 

1 

ci,* 

The 

total 

number 

of 

prime 

impl] 

.cants 

required  is 

2(i" 

-1) 

M! 

l 

= 

V 

k  = 

1 

M!  , 
i,k 

The  cost  for  the  entire  table  is  therefore 


(5.72) 


C™   =   \     C!  +  M!  (5-73) 

Tl      y i    i 

i=l 


6.   ESTIMATES  OF  COST  AND  PERFORMANCE 

6.1  Preliminary  Remarks 

In  this  section  we  use  the  analytic  tools  developed  in  Section  5 
together  with  the  definitions  in  Section  3  to  tabulate   samples  of  expected 
cost  and  performance.   Results  are  given  for  Type  2  structures,  Type  1 
structures,  and  finally  for  a  family  of  hybrid  structures.   Since  the  radix 
of  the  model  division  is  the  primary  determinant  of  performance,  for  each 
structure  we  first  consider  cost  versus  radix,  then  performance  versus  radix, 
and  finally  cost  versus  performance. 

Some  of  the  results  depend  upon  assignment  of  numerical  values  to 
quantities  used  in  the  definitions  of  Section  3.   The  values  selected  are 
based  upon  experience  in  arithmetic  unit  design.  A  different  set  of 
realistic  values  would  only  shift  the  location  of  the  cost-performances 
curves  and  not  materially  alter  the  shape  of  the  curve.   General  conclusions 
inferred  from  them  would  not  change . 

6.2  Type  2  Structures 
6.2.1  Cost  versus  Radix 

The  cost  of  Table  2,  C   ,  is  given  by 

n-1 

CT2  =    2  (C'(i)  +  M'U))  (6,1) 

i=0 

where  C'(i)  is  defined  by  Equation  5.39  and  M'(i)  is  defined  by  Equation  5.^0. 
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Tables  la.   and  Ih   summarize  cost  versus  radix  for  several  values  of  Arp. 
Table  7a  is  for  a  divisor  in  the  range  1/2  to  1  and  Table  7b  is  for  a  divi- 
sor in  the  range  3A  to  9/8.   In  all  cases,  p  =  2/3,  y'  =  ^'  =  1»  3'  =  1, 
and  a'  =  0.   The  quantity  Ad  is  2   where  6  is  given  for  each  entry  in  the 
tables. 

The  limiting  cases  {k   and  8)  are  based  upon  the  assumption  that 
the  precision  in  rp  and  d  is  increased  such  that  s.1  =  s.  .  A  near  minimal 
cost  should  lie  between  Cases  1  and  k   for  the  first  division  range  or  between 
Cases  5  and  8  for  the  second  division  range.   The  cost  entries  are  given  in 
the  following  form: 

18   (Prime  Implicants) 
111   (Literals  in  AND  Gates) 
129   (Total  Cost) 


Table  7a. 

Cost 

of  Table  2  versus 

Radix 

r 

6 

Case  1 
Arp=l/l6 

6 

Case  2 
Arp=l/32 

6 

Case  3 
Arp=l/6U 

6 

Case  k 
Arp=0 

k 

5 

18 
111 
129 

5 

15 
90 
105 

3 

Ik 

81 
95 

00 

13 
Ik 
87 

16 

8 

552 
6170 
6722 

7 

h6k 
506h 
5528 

7 

U30 
k6h6 
5076 

OO 

Uoo 

U291 
1+691 

6k 

9 

loVro 
160526 
170996 

9 

8792 
132578 
11+1370 

9 

81U8 

121971 
130119 

OO 

7595 
112928 
120523 

'56 

11 

17^597 
3381283 
3555880 

11 

11+6610 
2802307 
29^8987 

11 

135871 
2582126 

2717997 

OO 

126656 
239^169 
2520825 
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Table  7b.   Cost  of  Table  2  versus  Radix 


r 

6 

Case  5 
Arp=l/l6 

6 

Case  6 
Arp=l/32 

6 

Case  7 
Arp=l/64     i 

Case  8 
i     Arp=0 

1+ 

5 

10 
61 
71 

1+ 

8 

52 
60 

3 

8 

hi 

57 

7 
46 
53 

16 

7 

296 
3353 
3649 

6 

261 
2920 
3181 

6 

247 
2742 
2989 

234 
2583 
2817 

64 

8 

5597 
86870 
92*167 

8 

4953 
75988 
80941 

8 

4684 
71481 
76165 

4443 
67470 
71913 

256 

10 

93341 
1825332 
1918673 

10 

82590 
1600477 
1683067 

10 

78097 
1505408 
1583505 

74090 
1424130 
1498220 

6.2.2  Performance  versus  Radix 

The  following  equations  from  Section  3  are  relevant  to  the  calcu- 
lations in  this  section. 

Operating  Time  of  Model  Division: 


T=T     +T   +T   +T 
Q    PREF    Ml    T2    R' 


(3.7) 


Performance  of  Model  Division: 


log     r 
P  =  £ — 

«     TQ 


(3.8) 
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Operating  Time  of  Full  Precision  Division 

T„    T^  +  T 


TD  =  M  i+-^-  (3.U) 


Performance  of  Full  Precision  Division: 


2  log  r 
D   TA  log2  r  +  2(T  +  TQ)  K3'X£} 


Table  8  is  a  summary  of  P  and  P  for  several  radices  with  Tr.T,TPT:,=3 , 

TM1  =  °'  TT2=  2'  TR  =  ls  TA  =  3'  TC  =  k'      F°r  these  values  TQ  =  6-   Note 
that  we  have  actually  computed  a  best-case  for  performance  since  we  have 

assumed  that  Table  2,  even  for  the  higher  radices,  can  be  implemented  in  two 

delays  (TT2  =  2). 


Table  8.   Performance  of  Type  2  Structure  versus  Radix 
r  P   (bits/delay)  P   (bits/delay) 

k  .33  .15 

16  .67  .25 

6k  1.00  .32 

256  1.33  .36 

6.2.3  Cost  versus  Performance 

Neglecting  the  cost  terms  CpREF,  C  g^,,  and  C  ,  the  cost  of 
implementing  a  Type  2  structure  is  C     Table  9  summaries  the  bounds  on  C 
versus  performance  of  the  full  precision  division.   The  actual  cost  should 
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lie  between  the  lower  bound  (LB)  and  the  least  upper  bound  (LUB)  correspond- 
ing to  Case  1  in  Table  7a  and  Case  5  in  Table  7b.  These  results  are  plotted 
and  discussed  further  in  the  summary  and  conclusions  (Section  7)« 


Table  9«   Cost  Bounds  versus  Performance  for 
Type  2  Model  Division 


PD 

CT2 

(literals) 

(bits /delay) 

Times 

a  =  1/2, 

b  =  1 

Times 

a  =  3A,  b 

=  9/8 

Increase 

LB 

LUB 

Increase 

LB 

LUB 

.15 

1.00 

87 

129 

1 

53 

71 

.25 

1.67 

U.691 

6,722 

5^ 

2,817 

3,61*9 

.32 

2.13 

120,523 

170,996 

1385 

71,913 

92,1+67 

36        2.U0   2,520,825  3,555,880    28975   1,1*98,220    1,918,673 


6.3  Type  1  Structures 

6.3.1  Cost  versus  Radix 

Neglecting  the  cost  terms  CTmriTli   C^^^  and  C„,  the  cost  of  imple- 

PREF   DEF      R 

menting  a  Type  1  model  division  is  the  sum  of  C   and  C   .   Values  for  C  .. 
are  taken  from  the  results  given  in  Table  k.      The  term  C   is  computed  from 
Equat i  on  3.6,  namely , 

CM1  "  i   CR  +  \  NB  CA  +  (HA  +  X)   MB  CSG  +  ^   CC" 


The  following  values  are  assumed: 


CR=10,  CA=  50,  CSG=6,  cc-U,  NB=  8. 


Table  10  summarizes  the  results. 
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Table  10.  Cost  of  Type  1  Structure  versus  Radix 


r 

CT1 

J 

NA 

CM1 

CT0T 

k 

28 

3 

2 

1230 

1258 

16 

k5k 

6 

3 

1708 

2162 

Gk 

21+72 

9 

5 

263U 

5106 

6.3.2  Performance  versus  Radix 

In  computing  the  operating  time  for  a  Type  1  structure  we  assume 
that  TpREF  =  3,  TT2  =  0,  TR  =  1,  TA  =  3,  TQ  =  h,   and  T^  =  3  NA,  and  there- 
fore from  Equation  3.7, 

\  -   3  \   +  k- 

Table  11  presents  P   (Equation  3-8)  and  P   (Equation  3.12)  for  the  cases 
which  were  described  in  Table  k. 

Table  11.   Performance  of  Type  1  Structure  versus  Radix 
r         P   (bits/delay)         P  (bits/delay) 

y  d 

k  .20  .12 

16  .31  .17 

61+  .32  .19 

6.3.3  Cost  versus  Performance 

Table  12  merges  the  computations  of  the  previous  sections. 


PD 
(bits/delay) 

Times 
Increase 

.12 

1.00 

•  IT 

1.1+2 

.19 

1.58 
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Table  12.   Cost  versus  Performance  for  Type  1  Model  Division 


Times 
C  (literals)       Increase 

1258  1.00 

2162  1.72 

5106  U.06 


6.k     Hybrid  Structures 

6.^.1  Cost  versus  Radix  and  Number  of  Adders  in  Multiplier  1 

For  hybrid  structures  the  cost  is  computed  in  several  stages.   First, 

T   T 
C   and  the  worst-case  bounds  on  the  transformed  divisor  range  (a  ,  b  )  are 

computed  for  the  cases  of  1,  2,  3,  and  k   adders  in  Multiplier  1.   The  number 

of  adders,  N  ,  is  the  dominant  factor  in  the  performance  of  the  model  division 

and  furthermore  specifies  the  cost  of  Table  1  under  the  assumptions  presented 

in  Section  5.4.2.   Recall  that  the  maximum  uncertainty  in  A,  x,  is  2   where 

-1+2 

where  j  =  2  N  ;  that  the  maximum  uncertainty  in  d,  6,  is  2     ;  and  that 

i  and  6  determine  C   . 

Next  the  transformed  parameters  are  computed  for  each  of  the  four 
designs.   The  cost  equation  for  Table  2  is  evaluated  for  each  set  of  trans- 
formed parameters ,  each  for  four  different  radices ,  to  yield  a  total  of  six- 
teen designs.   The  total  cost  for  each  hybrid  structure  is  taken  to  be 

CT1  +  CM1  +  CM2  +  CT2* 

Table  13  summarizes  the  costs  for  the  sixteen  cases.   The  quantities 

T      T 
a  and  b  are  defined  by  Equations  5.62  and  5-63,  respectively,  and  C   is 

defined  by  Equation  5-73.  The  terms  C..  and  £       are  computed  from  Equation  3.6 

with  C.  =50,  C  =  10,  C0_  =  6,  C_  =  k,   and  e  =  5.   The  cost  term,  Cm_,  is 
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computed  from  Equations  5.33,  5-39,  and  5.1+0  with  the  transformed  parameters 


specified  as  follows : 

m  m 

X     =  1/16,  a  =  0,   | 


max 


T    -6+1 
1=2 


=  2,  ArpT  =  2  J  5,  Ad  =  2  J~°'   YT  =  l/l6, 
P  =  2/3. 


Table  13.   Cost  Computations  for  Hybrid  Structures 


Table  1  Parameters 


Case 
No. 


'Tl 


Ml 


'M2 


'T2 


Total 


V 

1,  j=2,   6=U, 

1 

1+ 

17 

512 

332 

332 

10 

928 

T 
a  = 

27/32 

2 

16 

17 

61+8 

332 

295 

3233 

1+525 

bT. 

41/32 

3 

61+ 

17 

7Qk 

332 

5597 

81+615 

9131+5 

4 

256 

17 

920 

332 

9331+1  178713 

,882,323 

V 

2,   j=l+,    6=6, 

5 

1+ 

126 

972 

892 

2 

11+ 

2006 

T 
a  = 

123/128 

6 

16 

126 

1220 

892 

76 

81+3 

3157 

tT= 

137/128 

7 

61+ 

126 

11+68 

892 

li+1+9 

22059 

25991+ 

8 

256 

126 

1716 

892 

2U169  h6[ 

1+92530 

NA= 

3,  j=6,   6=8 

9 

It 

688 

ikkk 

1708 

1 

3 

381+1+ 

T 
a 

=  507/512 

10 

16 

688 

182I+ 

1708 

19 

212 

1+1+51 

bT 

=  521/512 

11 

61+ 

688 

218U 

1708 

367 

5583 

10530 

12 

256 

688 

25I+1+ 

1708 

6121  118070 

129131 

NA= 

1+,   j=8,    6=10 

13 

1+ 

31+69 

1988 

2780 

0 

0 

8237 

T 
a  = 

201+3/201+8 

lU 

16 

31+69 

21+60 

2780 

5 

1+5 

8759 

bT= 

15 

61+ 

3I+69 

2932 

2780 

85 

1286 

10552 

2056/20I+8 

16 

256 

3I+69 

3U0I+ 

2780 

11+26 

27I+63 

385I+2 

6.1+ 

.2     Performance 

versus 

Radix 

and  N 

umber   of 

Adders 

in  Mult 

iplier  1 

In  computing  the  operating  time  for  the  hybrid  structures  we  assume 
that  TpREF  =  3,  TT2  =  2,  TR  =  1,  TA  =  3,  TQ  =  k  and  T^  =  3  NA,  and  therefore 
from  Equation  3.7 


TQ  =  3  NA  +  6. 
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Table  Ik   presents  P  (Equation  3-8)  and  P  (Equation  3.12)  for  the  cases  in 
Table  13. 

Table  Ik.      Performance  Calculations  for  Hybrid  Structures 


Case  No. 

P  (bits/delay) 

P  (bits/delay) 

1 

.22 

.13 

2 

.1*5 

.21 

3 

.67 

.27 

U 

•  89 

.32 

5 

•  17 

.11 

6 

.33 

.18 

7 

•  50 

.2k 

8 

.67 

.29 

9 

.13 

.09 

10 

.27 

.16 

11 

.40 

.21 

12 

.53 

.26 

13 

.11 

.08 

14 

.22 

.Ik 

15 

.33 

.19 

16 

.hk 

,2k 

10U 

6.1+.3  Cost  versus  Performance 

Table  15  merges  the  cost  and  performance  (Pn)  data  for  the  hybrid 

structures.   These  results  are  plotted  and  discussed  further  in  the  next 

section. 

Table  15.  Cost  versus  Performance  for  Hybrid 
Model  Division  Structures 


Case  No. 

PD 

Times 

C 

Times 

(bits/delay) 

Increase 

(literals) 

Increase 

1 

.13 

1.00 

928 

1 

2 

.21 

1.62 

H525 

5 

3 

.27 

2.08 

9131+5 

98 

1+ 

.32 

2.U6 

1882323 

2028 

5 

.11 

1.00 

2006 

1.0 

6 

.18 

1.63 

3157 

1.6 

7 

.2U 

2.18 

25991+ 

13 

8 

.29 

2.63 

1+92530 

2l+5 

9 

.09 

1.00 

381+1+ 

1.0 

10 

.16 

1.78 

1+1+51 

1.2 

11 

.21 

2.33 

10530 

2.7 

12 

.26 

2.88 

129131 

31+ 

13 

.08 

1.00 

8237 

1.0 

lU 

.111 

1.75 

8759 

1.1 

15 

.19 

2.37 

10552 

1.3 

16 

.2k 

3.00 

385I+2 

k.l 

7.   SUMMARY  AND  CONCLUSIONS 

7.1  General  Summary 

In  the  summary  and  conclusions  it  is  convenient  to  distinguish  be- 
tween the  definitive,  synthetic,  and  analytic  aspects  of  this  study.   Sections 
2  and  3  are  definitive.   Section  2  defines  the  class  of  division  techniques 
to  be  studied  and  Section  3  defines  the  measure  of  cost  and  performance  to  be 
applied.   It  is  noted  that  an  advantage  of  the  model  division  approach  is 
congruity  with  commonly  used  multiplication  structures  including  the  capacity 
to  form  the  partial  remainders  using  non- propagating  adders  or  subtractors . 
The  attendant  disadvantages  are  the  necessity  to  store  two  bits  per  quotient 
digit  and  the  requirement  for  a  terminal  step  to  convert  the  redundant  to  non- 
redundant  form.   The  fact  that  for  division,  unlike  multiplication,  the 
selection  of  the  jth  quotient  digit  cannot  be  straightforwardly  overlapped 
with  the  formation  of  the  jth  partial  remainder,  prompts  consideration  of 
high-speed  division  techniques  for  the  model.   Furthermore,  the  overhead 
required  to  "call"  and  "return"  from  the  model  division  prompts  study  of 
higher  radix  structures  which  produce  several  bits  per  call.   A  variable 
radix  block  structure  of  a  class  of  model  division  schemes  is  proposed  for 
study. 

Section  k   describes  algorithms  with  which  to  synthesize  the  most 
complicated  sub-blocks  of  the  family  of  proposed  quotient  selectors:  a  combi- 
natorial network  to  produce  an  estimate  of  the  reciprocal  of  the  divisor 
(Table  l),  and  a  combinatorial  network  to  generate  a  quotient  digit  when  given 
d  and  rp  (Table  2).  Although  these  synthesis  routines  generate  a  logic 
equation  definition  of  the  structure,  the  intent  in  this  study  is  merely  to 
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determine  the  cost;  essentially  the  number  of  literals  in  the  logic  equations. 
After  the  cost  vs.  performance  behavior  is  sufficiently  understood  to  permit 
specification  of  parameters  of  a  practicable  model,  the  synthesis  routines 
may  be  applied  as  a  first  step  in  implementation. 

Section  5  includes  the  bulk  of  the  analytic  work.   The  section  opens 
with  a  tabulation  of  costs  for  several  cases  synthesized  by  the  previously 
defined  algorithms.   But  since  there  exists  many  variants  of  the  model  divi- 
sion and  since  even  computer  synthesis  in  this  case  is  expensive,  the  numeri- 
cal results  and  insight  are  applied  to  hypothesize  formulas  rather  than 
algorithms  with  which  to  estimate  cost.   The  formulas  take  account  of  the 
ten  variables  of  the  model  division. 

Although  one  of  the  formulas  is  normalized  with  two  empirically 
defined  quantities,  it  is  assumed  that  these  quantities  are  sufficiently 
constant  to  permit  meaningful  prediction  of  cost  for  cases  other  than  those 
used  in  the  normalization.   In  Section  6,  the  formulas  for  both  cost  and  per- 
formance are  applied  to  tabulate  expected  values  of  cost  and  performance. 

The  present  section  is  an  attempt  to  summarize  the  work  in  the  pre- 
vious sections,  to  reach  some  conclusions  about  the  feasibility  of  the 
investigated  quotient  selection  schemes,  and  to  suggest  areas  for  further 
investigation.   The  section  is  subdivided  into  consideration  of  numerical 
cost  and  performance  results,  analytic  results,  and  concludes  with  additional 
remarks  about  areas  for  further  research. 

7.2  Cost  and  Performance 

Figure  20  is  a  graphical  summary  of  the  cost  versus  performance 
estimates  tabulated  in  Section  6.   The  necessity  for  a  five  cycle  semi-log 
plot  emphasizes  the  extreme  range  of  costs  and  disappointing  cost-performance 
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Figure  20.   Cost  yersus  Performance  for  Samples  of 
Model  Division  Structures 
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behavior.   It  is  apparent  that  many  of  the  results  are  negative;  they  indicate 
what  not  to  attempt  to  implement.   The  points  on  the  graph  are  taken  from 
Tables  9,  12,  and  15.   Points  corresponding  to  the  same  type  structure  but 
differing  in  radix  are  connected  by  straight-line  segments.   Each  of  these 
"curves"  is  labeled  with  a  Roman  numeral. 

Curves  la  and  lb,  with  points  from  Table  7b,  are  the  lower  and  upper 
bounds  on  the  cost  of  a  Type  2  structure  (direct  table  look-up)  for  divisors 
in  the  range  (3A,  9/8).   Curves  Ila  and  lib,  with  points  from  Table  7a,  are 
the  lower  and  upper  bounds  for  a  similar  structure  with  divisors  in  the  range 
(1/2,  l).   To  a  first  approximation  all  four  curves  (log  C)  vary  linearly 
with  performance  and  thus 

Cost  *  lO1^ 
where  k  is  about  18.   This  exponential  behavior  is  not  surprising  considering 

that  performance  varies  as  log  r  (see  Equation  3.12)  and  that  cost  varies  as 

o 
r  log  r.  This  latter  statement  is  derived  from  Equations  5-39>  5.^0,  and  5.^1. 

The  radix  k   Type  2  structure  is  quite  practicable,  requiring  about 
ten,10-input  gates  to  yield  performance  of  .15  bits  per  logic  delay.   Assuming 
10  ns.  logic,  the  scheme  would  generate  60  bits  of  quotient  in  about  k   ys.   A 
radix  16  Type  2  structure  theoretically  increases  performance  by  5/3,  conse- 
quently reducing  divide  time,  under  the  same  assumptions,  to  2.U  ys.   The 
cost,  however,  increases  over  50  times. 

Statements  about  the  radix  l6  structure  must  be  qualified  by  the 
observation  that  due  to  fan-in  and  fan-out  restrictions,  the  table  cannot 
actually  be  implemented  in  two  levels  of  logic.   Since  the  divisor  is  con- 
stant, the  d  portion  of  each  prime  implicant  can  be  formed  in  a  cascade  of 
many  logic  levels  without  degradation  of  performance.   But  going  to  additional 
levels  to  form  functions  of  rp,  although  cost  may  be  reduced,  will  decrease 
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performance  below  the  ideal  value  assumed  in  Figure  20.   Justification  for 
a  radix  l6  Type  2  structure  is  discussed  further  in  connection  with  a  "quo- 
tient lookahead"  scheme  mentioned  in  Section  7.5.  Type  2  structures  "beyond 
radix  16  are  too  expensive  to  consider  further. 

Based  upon  Figure  20,  curve  III,  it  appears  that  a  Type  1  structure 
is  never  preferable  to  a  Type  2  structure.  Although  this  is  probably  true, 
the  Type  1  structures  might  be  studied  further  with  the  following  points  in 
mind: 

1.  The  structures  studied  here  employ  a  rather  conventional 
multiplier  requiring  one  cascaded  adder  per  two  bits  of 
multiplier.   Perhaps  faster  multipliers  may  be  found.   It 
is  doubtful,  however,  that  they  would  be  less  expensive. 

2.  For  all  structures  studied  the  estimate  of  the  partial 
remainders  have  been  converted  to  a  conventional  form.   For 
structures  requiring  a  transformation  of  rp,  the  assimila- 
tion is  performed  after  the  multiplication.   The  conversion 
to  conventional  form  has  been  required  as  a  concession  to 
reducing  the  cost  of  Table  2.  For  Type  1  structures,  Table 

2  is  not  required  and  thus  perhaps  the  redundantly  represented 
result  could  be  used  directly  by  the  shift  gates  in  the 
full  precision  arithmetic  unit.   The  elimination  of  the 
conversion  is  roughly  equivalent  to  eliminating  one  adder 
from  the  multiplier  structure. 
The  cost  versus  performance  of  the  hybrid  structures  are  shown  in 
curves  IV-VII,  corresponding  to   1  through  k   adders  in  the  multipliers,  Ml 
and  M2.   The  curves  initially  rise  slowly  relative  to  the  Type  II  curves  but 
soon  become  steep  as  the  cost  of  Table  2  for  the  higher  radices  dominates. 
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2 
The  r  log  r  behavior  of  CT2  is  not  easy  to  suppress.  Again,  based  upon 

results  shown  in  Figure  20,  it  appears  that  hybrid  structures  should  not  be 

chosen  over  a  Type  2  structure. 

It  is  apparent  from  Equation  3.12  that  P  as  a  function  of  r  has  an 

upper  limit  of  T  /2.   This  limit  is  the  theoretical  upper  bound  on  the 

performance  of  the  iterative  steps  of  multiplication.   With  T  =  3,  the 

theoretical  ratio  of  performance  of  division  to  performance  of  multiplication 

for  cases  in  Figure  20  ranges  from  0.09  to  0.53.  For  practicable  cases,  the 

range  is  0.225  to  0.375- 

7. 3  Analytic  Results 

Only  a  few  of  the  cases  studied  appear  to  be  feasible.   But  negative 
results  are  valuable,  and  furthermore  it  should  be  kept  in  mind  that  the  main 
purpose  of  this  thesis  is  not  to  present  an  exhaustive  enumeration  of  quotient 
selection  schemes,  but  rather  to  develop  general  techniques  for  analysis. 

It  is  important  to  appreciate  the  generality  of  the  extension  of 
Robertson's  cost  measurement   (s.)  to  the  imprecise  cases  (s!  and  si'). 
Although  the  estimate  of  cost  as  a  function  of  s!  is  not  rigorous  and  includes 
empirically  defined  constants,  the  derivation  of  s .'  is  rigorous.   The  analysis 
developed  in  Section  5«3.2  leads  to  a  succinct  statement  of  worst-case  pre- 
cision requirements  in  rp  and  d,  (d"<  a)  and  to  insight  into  the  effect  of 
the  parameters  of  the  model  division  on  the  cost  of  quotient  selection. 

The  s!  cost  measurement  is  applicable  to  structures  other  than  those 
fitting  within  the  structure  of  the  model  division  shown  in  Figure  2.   For 
example,  as  mentioned  earlier,  the  treads  of  the  staircase  boundaries  between 
quotient  regions  may  be  viewed  as  comparison  constants  against  which  rp  is 
compared  to  determine  in  which  quotient  region  it  belongs.   The  divisor  range 
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is  partitioned  into  intervals  such  that  for  each  interval  there  is  a  single 

comparison  constant  between  each  quotient  region.  The  comparison  constants 

could  be  stored  in  a  read  only  memory.  A  given  divisor  value  would  determine 

a  column  of  comparison  constants  which  would  be  read  out  to  become  one  input 

to  a  set  of  comparators;  the  other  input  to  the  comparators  would  be  rp. 

If  c.  is  the  comparison  constant  between  q(i)  and  q(i-l)  then  q=k,  where  k 

is  the  greatest  such  that  rp  >_  c  .   The  number  of  sets  of  comparison 

constants  has  a  lower  bound  of  s1  and  upper  bound  of  s".   The  number  of 

n  n 

comparison  constants  in  each  set  is  n  (assuming  implementation  of  only  the 
first  quadrant  of  the  P-D  plot ) . 

Among  others,  the  analytic  results  prompt  the  following  observations 

1.  There  are  minimum  requirements  for  the  precision  in  the 
estimates  of  rp  and  d. 

2.  For  given  precision  above  the  minimum  required,  there  is  a 
limit,  s!,  to  the  minimum  number  of  comparison  constants 
required  between  q(i)  and  q(i-l). 

3.  The  actual  number  of  steps,  s.    .is  greater  than  s'.  due  to 

*  l  act ,  l 

discrete  effects,  i.e.  due  to  the  fact  that  the  locations  of 
treads  and  risers  are  restricted  to  discrete  values. 
h.      The  upper  bound  on  s.     ,  including  the  discrete  effects, 

X  cLC"0 

is  sV. 

1 

A  A 

5.   Increasing  precision  in  d  and  rp  moves  s.  closer  to  s.  and 

s.     closer  to  s ! ,  but  by  a  decreasing  amount, 
l  act  l 
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7 • h     Suggestions  for  Further  Investigation 

The  following  topics  for  further  investigation  have  emerged  in  the 
course  of  this  study.   The  order  of  listing  does  not  imply  any  priority. 

1.  Compare  the  cost  and  performance  of  the  model  division  approach 

to  other  division  algorithms  such  as  the  Wallace  algorithm  [32]  as 
implemented  in  the  IBM  360/9l[l^+],  and  division  schemes  in  other 
large  machines  such  as  the  CDC  76OO . 

2.  Consider  the  use  of  a  radix  k ,  Type  2  structure  in  a  pipeline  arith- 
metic unit.   Assuming  that  the  divisors  and  quotients  may  be  streamed 
along  with  the  partial  remainders ,  it  appears  that  a  set  of  the 
inexpensive  radix  k,   Type  2  model  division  structures  may  be  used 

to  effectively  pipeline  the  division  operation.  Multiplication  and 
division  could  be  intermixed  in  the  same  pipeline,  however,  assuming 
synchronous  control,  the  clock  frequency  is  limited  by  the  quotient 
selection  time  and  thus  the  multiply  time  is  degraded. 

3.  Consider  a  "quotient  lookahead"  scheme.   Assume  that  each  adder  in 
a  cascade  of  adders  is  capable  of  performing  a  multiplication  radix 
2  .   Then  the  shift  gates  for  each  adder  may  be  controlled  by  a 
model  division  of  the  same  radix.   If  the  radix  of  the  model  is 
greater  than  2  then  more  quotient  digits  are  formed  than  can  be 
used  in  forming  the  present  partial  remainder.   It  is  conceivable, 
however,  that  as  soon  as  they  are  formed  they  could  be  used  to  set 

shift  gates  to  form  the  next  partial  remainder  thus  overlapping 
control  time.   For  example,  if  k=2  but  the  model  division  is  radix 
16,  control  signals  for  the  shift  gates  of  two  successive  adders 
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are  generated  simultaneously.   If  a  radix  16  quotient  selector  is 
coupled  to  the  output  of  every  adder  in  the  cascade ,  then  for  each 
addition/subtraction  four  bits  are  formed,  two  of  which  overlap  with 
the  previously  formed  bits .  The  formation  of  the  jth  partial  re- 
mainder may  therefore  be  overlapped  with  formation  of  the  j+1, 
radix  k   quotient  digit.  After  startup,  the  effective  control  time 
per  addition  would  be  the  quotient  selection  time  minus  the  add 
time.   If  the  times  were  equal,  then  division  could  proceed  at 
multiply  speed. 

k.      Study  the  variation  in  cost  of  the  entire  arithmetic  unit  as  a 

function  of  p,  the  redundancy  ratio.   Recall  that p  is  one  variable 
in  the  equation  for  s!.   In  all  numerical  work  produced  in  this 
study  p  =  n/(r-l)  =  2/3.  The  decision  to  keep  p  constant  excluded 
the  explicit  study  of  radix  8,  32,  and  128  for  which  there  is  no 
integer,  n  such  that  p  =  2/3. 

5.  Study  a  model  division  structure  based  upon  simultaneous  comparisons 
of  rp  with  comparison  constants  selected  by  the  value  of  the  divisor, 

6.  Consider  the  engineering  details  of  a  radix  l6,  Type  2  structure. 

7.  Program  the  correct  algorithm  (Appendix  A)  for  producing  the  minimal 
cost  definition  of  a  Table  2  structure.   Reference  [3^]  defines  the 
minimization  algorithm.   Compare  the  results  with  those  produced  by 
the  QS3  algorithm  (Section  h) . 
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APPENDIX  A 

Algorithm  for  Generating  Minimum  Cost  Sum-of-Products 
Definitions  of  the  q-Regions  of  Table  2 

1.  Consider  the  P-D  plot  to  be  covered  by  a  uniform  grid  with  spacing  of  Ad 
along  the  d-axis  and  with  spacing  Arp  along  the  rp-axis.   The  inter- 
section  of  each  grid  line  is  defined  by  the  order  pair  (d,  rp)  where  d 

is  an  integer  multiple  of  Ad  and  rp  is  an  integer  multiple  of  Arp.   Every 
pair,  (d,  rp)  is  representative  of  full  precision  quantities  in  the  ranges 
defined  by  Equations  2.11  and  2.1^.   Sufficient  condition  for  the  choice 
of  X,  y,  a,  3,  Arp,  Ad,  6,  and e is  that  d"  (Equation  5-26)  be  greater 
than  a,  the  lower  bound  of  the  divisor  range.   If  Ad  and/or  Arp  are 
smaller  than  necessary,  the  excess  precision  is  removed  by  minimization. 
However,  the  smaller  Ad  and  Arp,  the  closer  the  boundaries  between  the 
q-regions  may  approach  the  theoretical  limit,  i.e.  the  smaller  will  be 
the  discrete  effects. 

2.  Every  pair,  (d,  rp)  corresponds  to  a  minterm,  rp| |d.   (See  page  38  for 
definition  of  the  notation. ) 

3.  Let  R.  be  the  set  of  minterms  which  are  required  to  define  q(i),  i.e. 
which  must  be  assigned  to  the  output  function  f . .   Thus, 

R.  =  {rp| |d  |  all  or  any  part  of  the  area  corresponding 
to  (d,  rp)  is  completely  within  the  area  defined  by 
the  lines  rp=(i+l-p)d,  rp=(i-l+p)d,  d=a,  and  a=b.} 

Let  T.  be  the  set  of  minterms  which  lie  completely  within  the  overlap 
region  between  q(i)  and  q(i+l).   Thus, 
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T.  =  {rp||d  J  the  area  corresponding  to  (d,  rp)  is 

completely  within  the  area  defined  by  the  lines 
rp=(i+p)d,  rp=(i+l-p)d,  d=a,  and  d=b.} 

Let  D  be  the  set  of  all  minterms  which  correspond  to  (d,  rp)  which  do  not 
represent  area  within  the  boundaries  of  the  P-D  plot,  i.e.  area  not 
within  any  q- region. 

Assume  a  minimization  algorithm  such  as  described  in  Section  1+.2.2  which 
will  accept  both  true  minterms,  0,  and  a  set  of  don't  care  minterms,  A, 
of  a  given  function.   The  result  of  the  minimization  process  is  a  minimal 
set  of  prime  implicants,  n.   Let  ft  be  the  set  of  minterms  implied  by  II, 
i.e.  all  minterms  for  which  the  function  defined  by  the  OR  of  the  ele- 
ments of  II  is  true. 

The  following  is  the  proposed  algorithm  for  defining  the  output  functions, 
f . ,  for  i=0,  1,. ..  ,  n. 

a)  Let  0  =  R  ,  A  =  T  U  D. 

b)  Execute  the  minimization  algorithms  to  produce  P  =  II,  and 
construct  MQ  =  ft.   Output  function,  f  ,  is  the  OR  of  the  elements 
of  P0. 

c)  For  i=l,2,...,n  do  the  following: 

Let  0  =  R.  U  (T.  .  -  (T.  .  n  M.  _)),  and  A  =  T.  U  D.   Execute 
l     l-l     l-l    l-l  l 

the  minimization  algorithms  to  produce  P.  =  n  and  construct 
M.  =  ft  .   Output  function  f.  is  the  OR  of  the  elements  of  P.. 
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APPENDIX  B 

Example  of  Results  of  QSU  and  Minimization  Program. 

Note: 

r=l+',  n=2,  a=l/2,  b=l 

;P=P1  P2  •  P3  PU  P5  P6 
d=-dl  d2  d3  dU 

In  the  following  '1'  implies  that  the  variable  is  present  in  true  form; 
'0'  implies  that  variable  is  present  in  complement  form;  'x'  implies  that 
variable  is  absent.   Variable  d  is  deleted  by  inspection. 


Minimal  cost  prime  implicants  for  q(0): 


Pl  P2  P3  ^  P5  P6  dl  d2 


OOOxOOxx 
OOOOxxxx 
OOOxxOxl 
OOOxOxxl 


Minimal  cost  prime  implicants  for  q(l) 


Pl  P2  P3  Fh   P5  P6  dl  d2  d3  dU 


OOOlxlxOxx 
OOOllxxOxx 
OOxlllxlxx 
OOlOxxxxxx 

OOlxxOxxlx 
OOlxOxxxxl 
0  Olxxxxxlx 
OOlxxxxlxx 
OlOOOOxxll 
OlOOxxxlxx 
OlOxxOxllx 
OlOxOxxllx 


117 


REFERENCES 


[I]  J.  E.  Robertson,  "A  new  class  of  digital  division  methods,"  IRE  Trans- 
actions on  Electronic  Computers,  vol.  EC-7,  pp.  218-222,  September  1958. 

[2]   T.  D.  Tocher,  "Techniques  of  multiplication  and  division  for  automatic 
binary  computers,"  Quart.  Jour.  Mech.  Appl.  Math.,  vol.  11,  Part  3, 
pp.  36U-38U,  1958. 

1-3]   c.  V.  Freiman,  "Statistical  analysis  of  certain  binary  division  algor- 
ithms," Proceedings  of  the  IRE,  vol.  U9 ,  pp.  91-103,  January  1961. 

[h]        M.  Nadler,  "A  high  speed  electronic  arithmetic  unit  for  automatic 
computing  machines,"  Acta  Technica,  no.  6,  pp.  k6h-krjQi   1956. 

[5]   J.  E.  Robertson,  "Methods  of  selection  of  quotient  digits  during 

digital  division,"  File  No.  663,  Department  of  Computer  Science,  Uni- 
versity of  Illinois,  Urbana,  Illinois,  June  1965. 

[6]   D.  E.  Atkins,  "The  theory  and  implementation  of  SRT  division,"  Report 
No.  230,  Department  of  Computer  Science,  University  of  Illinois, 
Urbana,  Illinois,  June  1967. 

[7]  D.  E.  Atkins,  "Higher  radix  division  using  estimates  of  the  divisor 
and  partial  remainders,"  IEEE  Transactions  on  Computers,  vol.  C-17 , 
no.  10,  pp.  925-93U,  October  1968. 

[8]   D.  E.  Atkins,  "Design  of  the  arithmetic  units  of  Illiac  III:  Use  of 

redundancy  and  higher  radix  methods,"  IEEE  Transactions  on  Computers, 
(to  appear)  August  1970. 

[9]   D.  E.  Atkins,  "illiac  III  computer  system  manual:   Arithmetic  units, 

vol.  I,"   Report  No.  366,  Department  of  Computer  Science,  University  of 
Illinois,  Urbana,  December  1969. 

[10]  J.  E.  Robertson,  "A  deterministic  process  for  the  design  of  carry-save 
adders  and  borrow-save  subtractors ,"  Report  No.  235  5  Department  of 
Computer  Science,  University  of  Illinois,  Urbana,  July  1967. 

[II]  R.  T.  Borovec,  "The  logical  design  of  a  class  of  limited  carry-borrow 
propagation  adders,"  Report  No.  275,  Department  of  Computer  Science, 
University  of  Illinois,  Urbana,  Illinois,  August  1968. 

[12]   F.  A.  Rohatsch,  "A  study  of  transformations  applicable  to  the  development 
of  limited  carry-borrow  propagation  adders,"  Report  No.  226,  Department 
of  Computer  Science,  University  of  Illinois,  Urbana,  June  1967. 


118 


[l3j  J.  E.  Robertson,  "The  correspondence  between  methods  of  digital  division 
and  multiplier  recoding  procedures  ,"  Department  of  Computer  Science 
Report  No.  252,  University  of  Illinois,  Urbana,  Illinois,  December  1967. 

[Ik]      S.  F.  Anderson,  J.  G.  Earle ,  R.  E.  Goldschmidt,  D.  M.  Powers,  "The 

IBM  System/360  Model  91;  Floating-point  execution  unit,"  IBM  Journal  of 
Research  and  Development,  vol.  11,  no.  1,  pp.  3*4-53,  January  1967. 

L15]   A.  Avizienis,  "Binary-compatible  signed-digit  arithmetic,"  AFIPS,  Fall 
Joint  Computer  Conference,  vol.  26,  pp.  663-672,  196k. 

[16]  V.  S.  Burtsev,  "Accelerating  multiplication  and  division  operations  in 
high-speed  digital  computers,"  in  report  by  The  Institute  of  Exact 
Mechanics  and  Computing  Technique,  The  Academy  of  Sciences  of  the  USSR, 
Moscow,  1958. 

[17]   M.  Combet ,  H.  van  Zonneveld,  and  L.  Verbeek,  "Computation  of  the  base  two 
logarithm  of  binary  numbers,"  IEEE  Transactions  on  Electronic  Computers, 
vol.  EC-lU,  no.  6,  pp.  863-867,  December  1965. 

[l8]   K.  J.  Dean,  "A  precision  code  converter  for  reciprocals  of  binary 

numbers,"  The  Computer  Bulletin,  vol.  12,  no.  2,  pp.  55-58,  June  1968. 

[19]   D.  Ferrari,  "A  division  method  using  a  parallel  multiplier,"  IEEE 

Transactions  on  Electronic  Computers ,  vol.  EC-16 ,  no.  2,  pp.  22*1-226, 
April  1967. 

[20]  R.  E.  Gilman,  "A  mathematical  procedure  for  machine  division,"  Communi- 
cations of  the  ACM,  vol.  2,  no.  k,   pp.  10-12,  April  1959- 

[21]   R.  E.  Goldschmidt,  "Applications  of  division  by  convergence,"  M.S.  Thesis, 
MIT,  June  196I+. 

[22]   Ernest  F.  Hall,  David  D.  Lynch,  Richard  E.  Young,  "Generation  of  products 
and  quotients  using  approximate  binary  logarithms  for  digital  filtering 
applications,"  IEEE  Transactions  on  Computers  Repository,  no.  R-68-l6*+, 
1968. 

[23]   Jiri  Klir,  "A  note  on  Svoboda's  algorithm  for  division,"  Stroje  Na 

Zpracovani  Informaci  (information  Processing  Machines),  no.  9,  pp-  35-39 > 
1963. 

[2k]  E.  V.  Krishnamurthy ,  "On  range-transformation  techniques  for  division," 
IEEE  Transactions  on  Computers,  vol.  C-19 ,  no.  2,  pp.  157-160,  February 
1970. 

[25]   John  N.  Mitchell,  Jr.,  "Computer  multiplication  and  division  using 

binary  logarithms,"  IRE  Transactions  on  Electronic  Computers ,  EC-11, 
no.  U,  pp.  512-518,  August  1962. 

[26]   Ray  G.  Saltman,  "Reducing  computing  time  for  synchronous  binary  division,' 
IRE  Transactions  on  Electronic  Computers,  vol.  EC-10 ,  no.  2,  pp.  169-171*, 
June  196l. 


119 


[27]  A.  Soceneantu,  "Binary  iterative  division,"  (Report  in  Progress), 

Department  of  Computer  Science,  University  of  Illinois,  Urbana,  Illinois, 
1970. 

[28]  R.  Stef anelli ,  "A  suggestion  for  an  high-speed  parallel  binary  divider," 
IEEE  Transactions  on  Computers  Repository,  no.  R-69-3,  October  1968. 

[29]   A.  Svoboda,  "An  algorithm  for  division,"  Stroje  Na  Zpracovani  Informaci 
(information  Processing  Magazine),  no.  9,  pp.  25-32,  1963. 

[30]   C.  Tung,  "A  division  algorithm  for  signed-digit  arithmetic,"  IEEE 

Transactions  on  Computers,  vol.  C-17 ,  no.  9,   PP«  887-889,  September  1968. 

[31]   R.  M.  Wade,  "A  carry-independent  quarternary  division  scheme,"  IEEE 
Transactions  on  Computers  Repository,  no.  R-68-52,  November  1967- 

[32]   C.  S.  Wallace,  "A  suggestion  for  a  fast  multiplier,"  IEEE  Transactions 
on  Electronic  Computers ,  vol.  EC-13,  pp.  lU-17,  February  19 6U. 

[33]   E.  J.  McCluskey,  Introduction  to  the  Theory  of  Switching  Circuits, 
McGraw-Hill,  New  York,  1965,  pp.  135-136. 

[3^]   V.  G.  Tar e ski ,  "Minimization  of  two  level  switching  circuits  involving 
many  variables,"  Ph.D  Thesis  in  preparation,  Department  of  Computer 
Science,  University  of  Illinois,  Urbana,  Illinois. 

[35]   Chester  C.  Carroll  and  George  E.  Jordan,  "A  fast  algorithm  for  Boolean 
function  minimization,"  Auburn  University  Report  No.  AD  680  305, 
December  1968. 

[36]   Tso-Kai  Liu,  "A  code  for  zero-one  integer  linear  programming  by  implicit 
enumeration  (A  Programming  Manual  for  ILLIP,)"  Department  of  Computer 
Science,  Report  No.  302,  December  1968. 

L37]   T.  Ibaraki,  et  al ,  "An  implicit  enumeration  program  for  zero-one  integer 
programming,"  Department  of  Computer  Science,  Report  No.  305,  January 
1969. 


120 


VITA 


Daniel  Ewell  Atkins,  III  was  born  in  Jacksonville,  Florida  on 
April  12,  19^+3.   He  received  the  B,S.  degree  in  Electrical  Engineering  from 
Bucknell  University,  Lewisburg,  Pa.,  in  1965;  the  M.S.  degree  in  Electrical 
Engineering  from  the  University  of  Illinois,  Urbana,  in  1967;  and  the  Ph.D.  in 
Computer  Science  from  the  University  of  Illinois  in  1970. 

Between  1963  and  1967  he  held  summer  positions  with  the  Freas- 
Rooke  Computing  Center,  Bucknell  University,  and  the  U.S.  Naval  Ordnance 
Laboratory,  White  Oaks,  Md.   While  attending  the  University  of  Illinois  he 
was  employed  as  a  research  assistant  in  the  Department  of  Computer  Science. 
He  designed  the  floating  point  arithmetic  units  for  the  Illinois  Pattern 
Recognition  Computer  (illiac  III)  under  direction  of  Professor  Bruce  H. 
McCormick,  and  conducted  research  in  the  area  of  computer  arithmetic  under 
the  direction  of  Professor  James  E.  Robertson.   Mr.  Atkins  has  published 
papers  evolving  from  this  work  in  the  IEEE  Transactions  on  Computers  of 
October  1968  and  August  1970. 

Mr.  Atkins  is  a  member  of  Tau  Beta  Pi,  Sigma  Xi ,  Pi  Mu  Epsilon, 
Pi  Delta  Epsilon,  the  Association  for  Computing  Machinery,  the  Institute 
of  Electrical  and  Electronic  Engineers,  and  the  American  Association  of 
University  Professors. 


m  AEC-427 

(6/68) 
ECM  3201 


U.S.  ATOMIC  ENERGY  COMMISSION 

UNIVERSITY-TYPE  CONTRACTOR'S  RECOMMENDATION   FOR 

DISPOSITION  OF  SCIENTIFIC  AND  TECHNICAL  DOCUMENT 

<  See  Instructions  on  Reverse  Side  ) 


\  AEC  REPORT  NO. 

Report  No.  397 
!  COO- 1018- 1201+ 


2.  TITLE 


A  STUDY  OF  METHODS  FOR  SELECTION  OF 
QUOTIENT  DIGITS  DURING  DIGITAL  DIVISION 


TYPE   OF   DOCUMENT    (Check  one): 

IX]  a.  Scientific  and  technical  report 

I    |  b.  Conference  paper  not  to  be  published  in  a  journal: 

Title  of  conference 

Date  of  conference 


Exact  location  of  conference _ 

Sponsoring  organization 

□  c.  Other  (Specify) 


"  RECOMMENDED  ANNOUNCEMENT  AND  DISTRIBUTION    (Check  one): 

DCl  a.  AEC's  normal  announcement  and  distribution  procedures  may  be  followed. 

I    I  b.   Make  available  only  within  AEC  and  to  AEC  contractors  and  other  U.S.  Government  agencies  and  their  contractors. 

I    |  c.   Make  no  announcement  or  distrubution. 

i  REASON    FOR    RECOMMENDED    RESTRICTIONS: 


i  SUBMITTED   BY:      NAME   AND  POSITION   (Please  print  or  type) 

Daniel  E.  Atkins,  Research  Assistant 


Organization 


Department  of  Computer  Science 
University  of  Illinois 


Signature 


Date 


May  28,  1970 


FOR   AEC   USE   ONLY 

'lAEC  CONTRACT  ADMINISTRATOR'S  COMMENTS,   IF    ANY,  ON    ABOVE    ANNOUNCEMENT  AND   DISTRIBUTION 
RECOMMENDATION: 


(PATENT  CLEARANCE: 

LJ  a.  AEC  patent  clearance  has  been  granted  by  responsible  AEC  patent  group. 
LJ  b.   Report  has  been  sent  to  responsible  AEC  patent  group  for  clearance. 
LJ  c.  Patent  clearance  not  required. 


jU*(3M970 


^^ 


$w 


